NAVAL POSTGRADUATE SCHOOL 
MONTEREY, CALIFORNIA 




THESIS 



THE DESIGN OF A 
PREDICTIVE READ CACHE 

by ' 

Joseph R. Robert, Jr. 

March, 1996 

Thesis Advisor: Douglas J. Fouts 



Approved for public release; distribution is unlimited. 



DUDLEY KNOX LIBRARY 
DAVAL POSTGRADUATE school 
MONTEREY CA 93943-5101 



REPORT DOCUMENTATION PAGE 


Form Approved OMB No. 0704-0188 


PubLic reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instruction, searching existing data 
sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other 
Aip-iCt o! this eallujlix. of including for ^Jueing this turJ^n, to Wratiugloto Headquarters Services, fet iafoi opc.ali^us 

Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington, VA 22202^4302, and to the Office of Management and Budget, Paperwork Reduction Project (0704-0188) 
Washington DC 20503. 


1. AGENCY USE ONLY (Leave blank) 2. REPORT DATE 3. REPORT TYPE AND DATES COVERED 

March 1996. Master’s Thesis 


4. TITLE AND SUBTITLE 

THE DESIGN OF A PREDICTIVE READ CACHE 


5. FUNDING NUMBERS 


6. AUTHOR(S) ROBERT, Joseph Roy, Jr. 


7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 
Naval Postgraduate School 
Monterey CA 93943-5000 


8. PERFORMING 
ORGANIZATION 
REPORT NUMBER 


9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 


10. SPONSORING/MONITORING 
AGENCY REPORT NUMBER 


1 1. SUPPLEMENTARY NOTES The views expressed in this thesis are those of the author and do not reflect the 
official policy or position of the Department of Defense or the U.S. Government. 


12a. DISTRIBUTION/A VAILABIUTY STATEMENT 

Approved for public release; distribution is unlimited. 


12b. DISTRIBUTION CODE 


13. ABSTRACT (maximum 200 words) 

The objective of this research has been the creation of a hardware design for a Predictive Read Cache (PRC). 
The PRC is a developmental cache intended to replace second-level caches common in modem microprocessor 
systems. The PRC has the potential of being faster and cheaper than current second-level caches and is distinctive 
in its ability to predict data addresses to be referenced by a central processing unit. 

Previous research has analyzed the behavior that the PRC must exhibit. During the described research, the 
behavior was modeled in the Verilog hardware description language. Verilog-XL was used for simulation, which 
uses the Verilog behavioral model as input. The behavioral model suggests that the internal structure of the PRC 
could be divided into six modules, each performing part of the function of the whole PRC. Each of these blocks 
was studied for hardware equivalents, easing the development of the total structural model. 

Using Verilog structural models as input, Epoch was used to automatically perform a very large-scale 
integrated (VLSI) circuit layout and to generate timiug information The Epoch output files are used f «r further 
simulation with Verilog-XL to identify critical parts of the design. The result of this research is a complete 
hardware design for the PRC. 


14. SUBJECT TERMS VLSI (very large scale integrated) design; memoiy address 
prediction; VERILOG; EPOCH; cache. 


15. NUMBER OF 
PAGES 200 

16. PRICE CODE 


1 7 SECURITY CLASSIFICA- 
TION OF REPORT 
Unclassified 


18. SECURITY CLASSIFI- 
CATION OF THIS PAGE 
Unclassified 


19. SECURITY CLASSIFICA- 
TION OF ABSTRACT 
Unclassified 


20. LIMITATION OF 
ABSTRACT 
UL 



NSN 7540-01-280-5500 Standard Form 298 (Rev. 2-89) 

Prescribed by ANSI Std. 239-18 298-102 



1 






11 



Approved for public release; distribution is unlimited. 



THE DESIGN OF A PREDICTIVE READ CACHE 



Joseph R. Robert, Jr. 

Lieutenant, United States Navy 
B.S., State University of New York at Buffalo, 1988 

Submitted in partial fulfillment 
of the requirements for the degree of 

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING 

from the 

NAVAL POSTGRADUATE SCHOOL 
March 1996 



DUDLEY KNOX LIBRARY 
JAVAL POSTGRADUATE SCHOOL 
flONTEREY CA 93943-5101 



ABSTRACT 

The objective of this research has been the creation of a hardware design for a 
Predictive Read Cache (PRC). The PRC is a developmental cache intended to replace 
second-level caches common in modem microprocessor systems. The PRC has the potential 
of being faster and cheaper than current second-level caches and is distinctive in its ability to 
predict data addresses to be referenced by a central processing unit. 

Previous research has analyzed the behavior that the PRC must exhibit. During the 
described research, the behavior was modeled in the Verilog hardware description language. 
Verilog-XL was used for simulation, which uses the Verilog behavioral model as input. The 
behavioral model suggests that the internal structure of the PRC could be divided into six 
modules, each performing part of the function of the whole PRC. Each of these blocks was 
studied for hardware equivalents, easing the development of the total structural model. 

Using Verilog structural models as input. Epoch was used to automatically perform a 
very large-scale integrated (VLSI) circuit layout and to generate timing information. The 
Epoch output files are used for further simulation with Verilog-XL to identify critical parts 
of the design. The result of this research is a complete hardware design for the PRC. 



v 



VI 



TABLE OF CONTENTS 



I . INTRODUCTION 1 

A. HISTORY 1 

B. PRINCIPLE OF OPERATION 1 

C. RESEARCH GOALS 3 

D. THESIS STRUCTURE 3 

II. TEST3ENCH 5 

A. OVERVIEW OF TESTBENCH 5 

B. SUMMARY OF '603 PROTOCOL 7 

C. TESTBENCH 9 

D. CPU 9 

E . MEMORY 9 

F. ARBITER 10 

G. TEST RESULTS 11 

III. PRC BEHAVIORAL MODEL DESIGN PHASE 15 

A. PSEUDOCODE MODEL 15 

B. DATA STRUCTURE 15 

C. BLOCK DIAGRAM 18 

D. CONTROLLER 21 

E. SNOOPER 2 3 

F. LINE MANAGER 24 

G. PREDICTOR 2 6 

H. DATA LIST 27 

I. BUS INTERFACE UNIT 27 

J. PREDICTION TESTS . 27 

K. CONCLUSION 3 0 

IV. PRC STRUCTURAL MODEL DESIGN PHASE 31 

vii 



A. PRC 31 

B. CONTROLLER 32 

C. SNOOPER 35 

D. LINE MANAGER 37 

E. PREDICTOR 3 9 

F. DATA LIST 41 

G. BUS INTERFACE 43 

H. TESTING 44 

V. CAD TOOLS 47 

A. VERILOG-XL 47 

B. CWAVES 48 

C. EPOCH 49 

VI. CONCLUSIONS AND RECOMMENDATIONS 53 

A. CONCLUSIONS 53 

B. RECOMMENDATIONS 55 

APPENDIX A. LAYOUTS 57 

APPENDIX B. TESTBENCH VERILOG FILES 73 

A. TESTBENCH 7 3 

B. CPU 76 

C. ARBITER 85 

D. MEMORY 89 

APPENDIX C. PRC BEHAVIOR FILES 97 

A. PRC 97 

B. CONTROLLER 98 

C. SNOOPER 104 

D. LINE MANAGER 107 



viii 



E. PREDICTOR Ill 

F. DATA LIST 112 

G. BUS INTERFACE UNIT 114 

H. PREDICTION TEST 122 

I. PREDICTION TEST RESULTS 12 6 

J. LINE REPLACEMENT TEST 13 0 

K. LINE REPLACEMENT TEST RESULTS 13 3 

APPENDIX D. PRC STRUCTURE FILES 137 

A. PRC 137 

B. CONTROLLER 138 

C. SNOOPER 143 

1. Thirty-Two -Input , Odd-Parity Checker . . 147 

D. LINE MANAGER 148 

1. Address Register With Equal Comparator . 150 

2. AND Gate With 128 Inputs and One Output 150 



3. Codefile for Seven-to-128 Decoder 

(dec7tol28e . codefile) 151 

4. One-Hundred-and-Twenty-Eight-Input , Seven- 

Output Encoder, Priority to Low Bits . . 154 

5. Thi rty- Two- Input , Five-Output Encoder, 

Priority to Low Bits 155 

6. Eight-Input, Three-Output Encoder, Priority to 

Low Bits 156 

7. Line Replacement Unit 158 

8. OR Gate With 128 Inputs, One Output . . 159 

9. Predicted Memory Address List 160 

10. One-to-128 Wire Splitter 163 

11. One-to-Seven Wire Splitter 164 

12. Set, Reset Latch 164 

13. Set, Reset Latch Array 128 Bits Wide . . 165 



IX 



E. PREDICTOR 165 

F. DATA LIST 167 

G. BUS INTERFACE 168 

1. Odd Parity Checker/Generator With 256 Inputs 

181 

2. Odd Parity Generator With 32 Inputs . . 182 

H. TEST RESULTS 183 

LIST OF REFERENCES 187 

INITIAL DISTRIBUTION LIST 189 



x 



I. 



INTRODUCTION 



A. HISTORY 

Billingsley and Fouts demonstrated the viability of using 
an address predicting buffer to reduce memory latency in 
computer systems. "The implementation of a MPB [Memory 
Prediction Buffer] is less expensive than a next-level cache 
and delivers a comparable performance enhancement." 
(Billingsley, 1992) 

With this in mind, Nowicki designed a Read Prediction 
Buffer (RPB) as part of his thesis work in 1992 (Nowicki, 
1992). This RPB was capable of prefetching data based on the 
previous pattern of memory accesses. Continuing the work of 
Nowicki, Aguilar tested that design and suggested several 
enhancements to improve it (Aguilar, 1995). A tentative 
design of this new Predictive Read Cache (PRC) was a part of 
his thesis work. 

Aguilar proposed a design consisting of six modules which 
together would comprise the PRC. He designed four of those 
six modules, testing each independently, but not together. 

B. PRINCIPLE OF OPERATION 

The Predictive Read Cache stores data only, not 
instructions. The design is based on a couple of observations 
about data fetches from main memory. First, within a 
specific block of data, the accesses often occur in sequential 
patterns such as every element in order, or every other 
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element in reverse order. The second observation is that a 
program often uses several blocks of data concurrently. 

The PRC takes advantage of the access patterns to predict 
future memory access addresses. The prediction is based on a 
linear displacement of the addresses . The PRC calculates the 
difference between two given addresses, then adds the 
difference to the most recent address to arrive at the 
predicted address. For example, if the Central Processing 
Unit (CPU) accesses the data at address 20h (hexadecimal 20) 
and then at address 40h, the PRC predicts that the CPU soon 
will need the data at 60h. Once the PRC has predicted an 
address, it fetches the data from that address. Once the data 
is stored in the PRC, the PRC can deliver that data to the CPU 
much more quickly than the main memory could deliver the data. 

The PRC handles multiple data blocks through its "lines." 
Each line is capable of tracking the pattern of accesses 
within a unique block of data. Thus, the PRC can track only 
as many access patterns as it has lines. 

When the cache is full and a new access pattern begins, 
a line has to be replaced. Lines that have not been used 
recently become aged. Aged lines are the first to be replaced 
when the cache is full. 

Data incoherency is avoided through the process of 
flushing lines. When a line is flushed, that line is marked 
as containing invalid data and is made available for tracking 
new access patterns. If the CPU writes data to an address from 
which the PRC has prefetched data, the PRC flushes the line 
with that data. 
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c. 



RESEARCH GOALS 



The objective of this research is to create a complete 
hardware design of the PRC. Completing the design has 
priority over the performance, though the performance must be 
better than the performance of main memory for this design to 
be of any value. 

The performance is measured in terms of the rate at which 
the Central Processing Unit (CPU) can access the data in the 
PRC. In the microprocessor system for which this PRC design 
is created, data accesses occur in groups. The groups are 
called "bursts." Each access within a burst is called a 
"beat." With a 60-ns memory and a 66-MHZ system clock, the 
four-beat burst operation takes 8 -3 -3 -3 cycles, that is, eight 
cycles for the first beat and three more cycles for each of 
the three remaining beats. The design of the PRC must perform 
at least this well and preferably much faster. 

D. THESIS STRUCTURE 



The Testbench is presented first, which is the Verilog 
model of the environment in which the PRC is expected to 
operate. This description includes a summary of the bus 
protocol and results of tests that show the correct 
performance of the Testbench. 

The description of the behavioral model design phase is 
presented next. This chapter presents a simple psuedocode 
model of the PRC which is used to develop an appropriate data 
structure and block diagram for the PRC. The individual 
blocks are each modeled with Verilog and then connected 
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together in the Testbench to verify that the entire PRC works 
as desired. 

Once the behavioral model design phase is complete, each 
block is converted into a hardware (structural) model. This 
phase of the design is detailed in Chapter IV. 

This thesis also contains a description of the Computer 
Aided Design (CAD) tools used for this research. The 
descriptions include tips for making their use easier and 
descriptions of any problems encountered. 
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II. TESTBENCH 



This chapter describes the Testbench, the environment in 
which the Predictive Read Cache (PRC) was designed to operate. 
In particular, it summarizes the bus arbitration protocol and 
explains the important aspects of each part of the Testbench. 
The chapter concludes with the test results of the Testbench 
itself . 

A. OVERVIEW OF TESTBENCH 

The Testbench models and simulates the environment in 
which the PRC design was tested. As indicated in Figure 1, it 
comprises four blocks, one of which is the PRC itself. The 
Testbench was developed with Verilog behavioral models. The 
CPU module simulates various functions of a PowerPC-603 . The 
Memory module simulates the behavior of a 60 -ns dynamic random 
access memory (DRAM) . The Arbiter controls access to both the 
address and data busses. Each of these modules is described 
in more detail in the following sections, after a description 
of the PowerPC-603 bus protocol. 
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Figure 1. Block Diagram of Testbench. 



There were four major decisions made regarding the design 
of the Testbench. The first decision was to use a PowerPC-603 
microprocessor system as the environment in which this PRC 
will operate. The work of Aguilar was started using the '603 
(Aguilar, 1995) . It is still a current member of the PowerPC 
family; the protocol should not be out of date for quite some 
time . 

The second design decision was to limit the '603 to in- 
order transactions. The '603 is capable of performing certain 
sequences of data transfers out of order. That is, the order 
of the data bus cycles can be different from the order of the 
address bus cycles. Prohibiting these transactions made the 
CPU model simpler and simplified the design of the PRC. This 
did not undermine the demonstration of the PRC as a viable 
memory management tool . 

The third design decision was to use a 6 6 -MHZ system bus 
and CPU clock rate. Sixty-six-MHZ is a reasonably fast system 
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bus speed. Designing for a slower bus speed could severely 
reduce the applicability of this design to modern systems. 

The fourth decision was to use the 64-bit data bus vice 
the optional 32-bit configuration. When configured with the 
64-bit data bus, the PowerPC-603 can access memory in one of 
two modes: single-beat or four-beat burst. A single beat is 
one memory access of one to eight bytes. A four-beat burst is 
a sequence of four sequential memory accesses, eight bytes per 
beat totaling 32 bytes. When configured with the 32-bit data 
bus, the '603 can access memory in one of three modes: single- 
beat (one to four bytes), two-beat burst (eight bytes), or 
eight-beat burst (32 bytes) . Data transfers are less 
complicated with the 64-bit data bus since there are fewer 
transfer options and a smaller number of beats. Also, the 
time from one cache miss to the next is independent of the 
data bus size. Since a burst transfer on the 32-bit bus takes 
more cycles, there is much less time between cache misses for 
the PRC to do its job, perhaps too little time. Further, the 
32-bit mode is specific to the '603; therefore, the PRC would 
have to be redesigned to be used with the other 64-bit bus 
members of the PowerPC family. A disadvantage of the 64-bit 
option is the increased number of pins required for the PRC 
from about 108 to about 140. 

B. SUMMARY OF '603 PROTOCOL 

The PowerPC-603 has separate data and address busses, 
each with independent cycles, referred to as tenures by the 
Motorola engineers. Tenure has three phases: Arbitration, 
Transfer and Termination. 



The system has a bus arbitration unit which controls the 
passing of bus mastership between the requesting units. In 
this implementation, the CPU and the PRC are the only 
candidates for bus mastership. Module Arbiter is the 
arbitration unit . 

When a unit wants the bus, it asserts BR_ (bus request) . 
If the unit can have the bus next, the arbiter asserts BG_ 
(bus grant) back to that unit. Then the unit waits, if 
necessary, for the previous master to finish its tenure, after 
which the unit takes mastership by asserting ABB_ (address bus 
busy) . When the current master is done with the address bus, 
it negates ABB_. 

This system has no external cache or multiple processors; 
thus, there are no address-only transactions. If a unit wants 
the address bus, it will also want the data bus. After 
granting the address bus by asserting BG_, the arbiter then 
grants the data bus by asserting DBG_. 

Both BG_ and DBG_ remain asserted until the requesting 
unit takes mastership or withdraws its request by negating 
BR_. If there are no pending bus requests, the arbiter "parks" 
the CPU by granting it the busses. If the CPU is parked, it 
does not have to take the time to request the bus, thereby 
reducing the time for the memory access. If the CPU is parked 
and the PRC requests the bus, the arbiter unparks the CPU and 
grants the bus to the PRC. 
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C . TESTBENCH 

The Testbench is the highest level in the design 
hierarchy. It connects the CPU, PRC, memory, and arbitration 
unit. This module establishes the system clock rate and 
controls the simulation time. 

D. CPU 

The CPU module simulates PowerPC-603 memory accesses. 
The Sequencer is a sub-module of the CPU which makes the 
Testbench able to simulate every transaction relevant to the 
memory and PRC. These transactions can occur in any order. 
Many of the possible '603 transactions are not applicable to 
this particular system configuration. For example, none of 
the "address only" transactions are relevant, since they are 
for systems with multiple processors or second-level caches. 
Bus arbitration is accurately modeled, including the pipelined 
address tenures. 

E . MEMORY 



This module emulates the main memory of the system. For 
simulation efficiency, the memory has only enough physical 
address space for four-beat burst reads: 128 bytes. The 
address bus width allows a virtual address space of four 
Gbytes. Accesses to addresses past the first 128 bytes map to 
addresses within the first 128 bytes. 

The time required for memory accesses are determined by 
the use of the parameters Delayl and Delay2 . The heading in 
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the file memory, v describes how to adjust these parameters to 
achieve a realistic memory access rate. 

There were two significant decisions made about the main 
memory design. First, the memory emulates a 60-ns DRAM 
memory. With a 60-ns memory and a 6 6 -MHZ system clock, the 
four-beat burst operation takes 8 -3 -3 -3 cycles, that is, eight 
cycles for the first beat and three more cycles for each of 
the three remaining beats. 

The second design decision was to add a cancel feature to 
the main memory chip. The memory module has an input called 
CANX which cancels the current read operation. It is through 
this signal that the PRC stops the memory module from 
delivering data to the CPU when the PRC already has the data. 

Another option would be to put the PRC between the CPU 
and Memory, not allowing a read request to get to the memory 
chip until after the PRC had checked its contents.- This 
scheme would increase the time of all memory accesses. 

F . ARBITER 

The Arbiter emulates the external bus arbitration unit, 
implemented as a Finite State Machine (FSM) corresponding to 
the state diagram in Figure 2 . 

The memory unit in this Testbench is capable of handling 
up to two memory accesses in the pipeline at a time, which is 
the maximum that the CPU will ever cause. Adding the PRC to 
the system creates the possibility of three accesses in the 
pipe. For example, the PRC could initiate a third address 
tenure before the first of two CPU transactions is complete. 
This potential problem is handled by the Arbiter which keeps 
track of the pipelining depth. It will not grant the address 
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bus to any unit if that address tenure would put a third 
transaction in the pipeline. Rather, the Arbiter will stall 
until the data tenure from the first transaction is complete, 
after which the Arbiter will grant the address bus to the 
requesting unit. 




States 



A: Start 








B: Grant 


CPU 


addr 


bus 


C: Park 


CPU 






D: Grant 


CPU 


data 


bus 


E: Grant 


PRC 


addr 


bus 


F: Wait 


for PRC 




G: Grant 


PRC 


data 


bus 


Inputs 








[CPU_BR_ 


, PRC 


_BR_] 




Outputs 








[CPU BG 


, CPU 


DBG , 




PRC 


_BG_, 


PRC_DBG_; 



Numbers refer to verilog state numbers. 



Figure 2. State diagram for Arbiter FSM. 



G . TEST RESULTS 

Testing the Testbench itself was important to establish 
that the models matched the behavior described in the Power PC 
User's Manual. The Testbench passed all tests of reads, 
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writes and burst operations, in various sequences of 
transactions and using an assortment of memory access delays. 

Figure 3 shows the fastest possible burst operations, as 
if the memory access time were not the limiting factor. Note 
again that the address tenure of the second transaction can 
start before the data tenure of the first transaction is 
complete . 




Figure 3. Burst write, then burst read. Delay=0 . [cWaves 
output] 



Figure 4 shows a burst write transaction with an access 
delay of three cycles and a delay of one cycle in between each 
beat. A realistic 60-ns DRAM will have a delay of 8-3-3-3 
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rather than the 3 -1-1-1 shown here. The PRC however should be 
able to supply data this quickly. 




Figure 4. Burst write, burst read. Delay=3-1-1-1 . [cWaves 
output] 
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III. PRC BEHAVIORAL MODEL DESIGN PHASE 



This chapter presents the development of the behavioral 
models for the PRC. A simple pseudocode model is presented 
first. This model was used to develop an appropriate data 
structure and block diagram for the PRC. The individual 
blocks in this block diagram were implemented with Verilog 
behavioral modules and tested together to verify the 
behavioral model of the PRC. The next step was to convert 
each module into a hardware model compatible with Epoch, 
detailed in the next chapter. 

A . PSEUDOCODE MODEL 



The behavior of the PRC is explained in detail in the 
paper by Fouts & Billingsley (1994, p.113) and summarized in 
the Introduction chapter of this thesis. Another way of 
summarizing this behavior is through a pseudocode model as 
shown in Figure 5, which is just detailed enough to identify 
the most significant capabilities the PRC must have. The 
purpose of taking this approach was to clarify the function of 
the PRC and to aid in identifying specific behaviors of this 
cache which the hardware needs to exhibit. 

B . DATA STRUCTURE 

A possible data structure for the PRC is shown in Figure 
6 . Each of the 128 lines within the PRC must contain two 
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addresses, some status information and data. The two 
addresses are required to maintain the memory access pattern. 

There are also two seven-bit pointers, each containing a 
value in the range of zero to 127. The ActiveLine pointer 
contains the number of the line that is currently being used 
by the PRC. The ReplaceLine pointer contains the number of 
the next line to be replaced when a new line is needed. 
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*** PRC BEHAVIOR MODEL IN PSEUDOCODE *** 

// CAR = current address register 
// MRMA = most recent memory address 
// PredMA = predicted memory address 

always at negative edge of HRESET_ 
clear all status flags; 
put PRC in IDLE state; 

ActiveLine = 0; ReplaceLine = 0; 

<IDLE> 

wait for next transaction 
CASE (transaction) 
data burst-read: 

if CAR hits in PRC, //PRC has requested data 
switch ActiveLine to line that was hit; 
send data to CPU; 
send cancel signal to memory; 
predict next address; 

if next address is not already in PRC, 
read next address; 
store in ActiveLine; 
update MRMA and PredMA; 

else if CAR misses, //PRC does not have requested data 
switch ActiveLine to the next ReplaceLine; 
if this is the first miss for this line, 
store this address in MRMA; 
if this is the second miss for this line, 
initiate search for next ReplaceLine; 
predict next address; 
if next address not already in PRC, 
read next address; 
store in ActiveLine; 
update MRMA and PredMA; 

burst-write, or write: 
if CAR hits, 

flush matching line; 

data read or instruction transaction: 
ignore; 

endcase ; 

goto IDLE; 



Figure 5 . PRC Pseudocode Model 
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DATA STRUCTURE 



Line 



PredMA (0:26) MRMA (0:26) ^atus 



DATA (32 bytes) 











64 bits 


64 bits 


64 bits 


64 bits 



















































































PredMA = Predicted Memory Address 
MRMA “ Most Recent Memory Address 
V - Valid 
A - Aged 



ActiveLine 



ReplaceLine 



Figure 6. PRC Data Structure. 



C . BLOCK DIAGRAM 

The pseudocode model revealed several specific tasks the 
PRC must be able to accomplish. Identifying and clarifying 
these tasks resulted in the development of six blocks within 
the PRC. These blocks are shown in the block diagram of 
Figure 7 and are described briefly here. 

The Snooper watches transactions between the CPU and 
memory, raising appropriate signals if the transaction is one 
in which the PRC is interested. 

The Line Manager contains the Address List and Line 
Replacement Unit as sub-blocks. The Address List contains all 
the recently-accessed memory addresses and all the predicted 
addresses . The Line Replacement Unit determines which of the 
128 lines will be replaced the next time a new line is needed. 
These two blocks are grouped together because they share 



18 



status information about the lines and work closely together 
for line management. 

The Predictor module uses its two input addresses to 
predict its output address. 

The Data List stores 128 lines of data, 32 bytes in each 
line, which is the amount of data in each burst read or burst 
write . 

The Bus Interface handles the protocol of data transfers 
in to and out of the PRC. 

Finally, the Controller coordinates the actions of all 
the other functional blocks to accomplish the mission of the 
PRC . 
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PRC BLOCK DIAGRAM 
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CONTROLLER 



D. 



This module is a Finite State Machine which coordinates 
the actions of all the other functional blocks of the PRC. 
All control signals are synchronous with the system clock. 
HRESET_ causes the Controller to go to the IDLE state. The 
state diagram and state output tables are shown in Figures 8 
and 9 . 



Controller 

STATE 


State Output 

test 

a select Dr 


Table 

store 

edict 


flush 


send 


new 
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reolace 

fetch 


IDLE 


X 


0 


0 


0 


0 


0 


0 


0 


0 


TEST_CAR(R) 


CAR 


1 


0 


0 


0 


0 


1 


0 


0 


SEND_DATA 


X 


0 


1 


0 


0 


1 


0 


0 


0 


TEST_NAR 


NAR 


1 


0 


0 


0 


0 


0 


0 


0 


FETCH_DATA 
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0 


0 


0 


0 


0 


0 


0 


1 


IS_LINE_EMPTY 


X 


0 


0 


0 


0 


0 


1 


0 


. 0 


PREDICTJMA 


X 


0 


1 


0 
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0 


1 
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0 


STORE_CAR 
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0 


0 


1 
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0 


1 
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0 


TEST_CAR(W) 
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1 


0 


0 


0 


0 


1 


0 


0 


FLUSH_LINE 
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0 


0 


0 


1 


0 


1 


0 


0 



Figure 8. Controller State Output Table. 
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Controller State Diagram 




Figure 9. Controller State Diagram. 
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SNOOPER 



E. 



This module watches the system bus activity and makes 
appropriate reports to the PRC Controller. 

If the transaction is a data burst read or any kind of 
write and if the address parity is correct, then two actions 
occur. First, read or write is asserted as appropriate. 
Second, the address is placed in the Current Address Register 
(CAR) . The snoop_ignore signal tells this unit to ignore the 
current transaction, because it was initiated by the Bus 
Interface Unit. The snoop_ignore signal must be asserted 
concurrently with the transfer attributes. 

Reads that are not burst reads or data related are 
ignored by the PRC. The CAR is updated only on transactions 
relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC 
with respect to memory accesses, a second address tenure can 
occur shortly after the first, well before the first data 
tenure is complete. To compensate for this, the read and 
write outputs of the Snooper remain exerted until acknowledged 
by the Controller with held. The rising edge of hold 
indicates that the read or write signal was received by the 
Controller. The Snooper then can negate these signals but 
must leave CAR alone until hold is negated. After hold is 
negated, CAR can be updated to the new address. 
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LINE MANAGER 



F. 



This module contains the address list, status flags for 
each line (Valid, Aged) , a general status flag ( line_empty ) , 
the line replacement unit, and a couple of pointers 
(ActiveLine, ReplaceLine) . On HRESET_, Valid=0 (all lines), 
Aged=0 (all lines), line_empty= 1, ActiveLine=0 . 

The MRMA output is always the MRMA of the ActiveLine. 
The line_empty flag indicates that the currently active line 
has no addresses in it yet; therefore, the addresses cannot be 
used by the PRC to make a prediction. 

The input a_selec t determines which address input is used 
for a particular operation. The two address inputs are the CAR 
and the NAR . 

When the Line Manager receives a test signal, it compares 
the input address with the contents of the PredMA List. If 
there is a match with the CAR, it asserts the hit signal and 
changes the ActiveLine pointer to the line number of the hit. 

If there is a miss with the CAR, then the ActiveLine 
switches to the same line to which ReplaceLine points . 

If, during a test, there is a match with the NAR, two 
actions occur. First, hit is asserted. Second, the value in 
ActiveLine becomes irrelevant since it will not be used. If 
there is a miss with the NAR, the ActiveLine must remain 
unchanged from the test. 

The fetch_done signal from the Bus Interface causes the 
NAR to be stored in PredMA [ActiveLine] , the CAR to be stored 
in MRMA [ActiveLine] , the Valid flag to be set, and the Aged 
flag to be reset. 
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The flush signal causes the current ActiveLine to become 
invalid by setting Valid [ActiveLine] = 0. 

The store signal causes the input address to be stored 
into the MRMA of the ActiveLine. This is only used for the 
first address in a new line. The store signal also causes the 
line_empty flag to be reset. 

Line replacement: ReplaceLine always points to the line 

to be replaced at the next PRC miss. HRESET_ causes this to 
be zero. 

As soon as the PRC starts predicting the first address 
for a line it asserts new_replace. The replacement unit then 
finds a new line to mark as the next ReplaceLine according the 
following procedure. 

Done=f alse; 
repeat 

ReplaceLine = ReplaceLine + 1; (mod 128 addition) 
if not (Valid [ReplaceLine] ) 

Done=true ; 

elseif (all_line_are_valid AND Aged [ReplaceLine ] ) then 
Done = true; 
else 

Aged [ReplaceLine] = 1; 
until Done; 
line_empty=l ; 

In words, the Line Replacement Unit searches sequentially 
for the next line with invalid data and marks that line as the 
next line to be replaced. If all lines contain valid data, 
then it scans for the next line that is "aged," indicated by 
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a set Aged flag. As it scans for an aged line, it sets the 
Aged bits in the "unaged" lines it passes. Therefore, as it 
wraps around in the search for an aged line, it will 
eventually come upon one, even if none were aged when the 
search began. 

All of this occurs while the PRC is fetching data. 
Therefore, the PRC has several clock periods in which to 
complete the search. 



G . PREDICTOR 

The Predictor module has two address inputs, the Most 
Recent Memory Address (MRMA) and the Current Address (stored 
in the Current Address Register, CAR) . It has a single 
output, the Next Address which is stored in the Next Address 
Register, NAR. 

This module calculates the Next Address based on the Most 
Recent Memory Address and the Current Address . The rising 
edge of predict initiates the prediction calculation. The 
original equation is 

NAR = CAR + (CAR - MRMA) 



which is implemented as 



NAR = 2 *CAR - MRMA. 

The output NAR remains latched and valid until next 
predict leading edge. 
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DATA LIST 



H. 



The inputs to the Data List are upload , download and 
ActiveLine. The 256-bit bus data_line is an input and output. 

An upload signal causes the Data List to store the data 
on data_line into the address specified by ActiveLine. A 
download signal causes the Data List to assert onto data_line 
the data in the address specified by ActiveLine. 

I. BUS INTERFACE UNIT 

This module handles the protocol of data transfers in to 
and out of the PRC, coordinating these activities through the 
use of a Finite State Machine. 

When this module receives a fetch signal, it latches the 
address in the NAR and requests the bus for a burst read. It 
stores the incoming data until all four bursts have been 
received. Then, it uploads the data into the Data List and 
asserts fetch_complete . 

When this module receives a send signal, it sends a 
cancel signal (CANX) to the memory module, downloads data from 
the Data_List and then sends the data to the CPU. When the 
transfer is finished, it asserts send_done . 

J. PREDICTION TESTS 

There are two large-scale tests included in this thesis. 
The first is the Prediction Test. The second is the Line 
Replacement Test. Together, these tests are sufficient to 
demonstrate that the behavioral model functions as desired. 
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Once the behavioral model of the PRC passed these tests, it 
was ready for conversion to a hardware model . 

The tests are both conducted by connecting the behavioral 
model of the PRC to the Testbench described in the previous 
chapter and running a simulation with a sequence of events. 
The sequence of events for the Prediction Test is included in 
the sequencer4.v file. The sequence of events for the Line 
Replacement Test is located in the sequencers .v file. The 
following procedure lists the steps necessary to conduct a 
test : 

1. Change directories (cd) to the . . . verilog/behavior/ 
directory . 

2 . Modify the file verilog_arguments so that it contains 
sequencer4.v or sequencer5.v as desired and all the 
parts to the PRC and to the Testbench. 

3. Modify the file testbench. v to set the simulation 
duration as described in the heading of the desired 
sequencer. Modify the trace flags in every file 
listed in verilog_arguments as described in the 
sequencer file. 

4. At the Unix command prompt, enter the command verilog 
-f verilog_arguments . 

The Verilog-XL outputs of both tests are included in the 
appendices. Together, these tests show that this behavioral 
model performs all the desired functions. 

The Prediction Test, using Sequencer4, causes a series of 
CPU transactions that tests the ability of the PRC to make the 
prediction calculation and to fetch the data. The 

transactions are as follows: 



Burst_read at 00h: The PRC stores this address. 
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burst_read at 20h: The PRC should predict a next 

address of 40h and then fetch the 



data from that address. 

burst_read at 180h: The PRC should store this address in 

a new line. 

burst_read at lAOh: The PRC should predict a next 

address of lCOh and then fetch the 
data from that address. 

burst_read at 40h: This data is already in the PRC, so 

the PRC should send it to the CPU 

and then fetch data from 60h. 

burst_write, lCOh: This data is in the PRC, so this 

line should be flushed. 

burst_read at 60h: The PRC should deliver this data to 

the CPU and then fetch the data at 
80h. 

burst_read at lOOh: The PRC should start a new line and 

store this address. 

This test successfully demonstrates a majority of the 
capabilities of the PRC, showing when the Line Manager selects 
new lines, when and how the Predictor functions, and when the 
CPU starts a read or write and the data involved. The test 
shows when the Bus Interface Unit fetched data from memory. 
The Data List reported the flow of data in and out of itself. 

The only significant behavior not exercised by this test 
is the function of the Line Replacement Unit when the PRC is 
full. That is handled with Sequencers in the Line Replacement 
Test . 

The Line Replacement Test was accomplished by a series of 
CPU transactions that quickly fill the PRC. The test shows 
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that the Line Replacement Unit correctly selected invalid 
lines to be replaced first. When all the lines in the PRC 
contained valid data, the Line Replacement Unit executed the 
algorithm described in the section on the Line Replacement 
Unit . 



K. CONCLUSION 

At this point in the development of the PRC, the 
behavioral model was functioning properly. Therefore, it 
could be converted piece by piece into a hardware model . This 
was accomplished using the subset of Verilog understood by 
Epoch, as described in the next chapter. 
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IV: PRC STRUCTURAL MODEL DESIGN PHASE 



This chapter presents the development of the hardware 
model of the PRC. In this phase of the design process, each 
of the behavioral blocks developed in the previous phase was 
implemented with hardware. Converting the blocks in order of 
increasing complexity proved to work out well, making it 
easier to concentrate first on learning how to use Epoch. 

Like the behavioral models, the hardware (structural) 
models are Verilog files. Epoch uses these Verilog files to 
create VLSI layouts. From those layouts. Epoch calculates 
timing information and generates new VerilogOut files with 
this timing information. As each block is converted into 
hardware, the new VerilogOut model can replace the original 
behavioral model in the Testbench for testing with Verilog-XL. 
The following hardware blocks result from using this 
procedure . 

Each section of this chapter also includes a figure 
displaying some important geometric information about the 
module, including surface area and transistor count. This 
information can be obtained from Epoch with the shell command 
geos tat -trancount <module names. 

A. PRC 



The top level module is only a connection of each of the 
modules described in the following sections. The geostat 
information is shown in Figure 10. Of particular significance 
are the transistor count and the total chip area. 
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Bounding Box : 

9080.748 x 11278.224 microns, 102414707.226 square microns. 
357.510 x 444.025 mils, 158743.109 square mils. 

Number of Pins = 316. 

Number of unique cells = 6. 

Number of Datapaths = 1 
Number of Sub-Glues = 5 
Total Number of Instances = 6 

Total number of nets = 498. 

Total metall layer route length = 2120297.98 microns. 

Total metal2 layer route length = 699802.75 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 2820100.74 microns. 

Total number of vias = 2460. 

Total number of segments = 16989. 

Reading transistor view . . . 

Total number of 454310 transistors. 

0.349 Square mils per Transistor. 

2.862 Transistors per square mil. 

Power Dissipation = 4742486.500 micro-watts. 



Figure 10. PRC Geostat Information. [Epoch output] 



B . CONTROLLER 

This module is a Finite State Machine which coordinates 
the actions of all the other functional blocks of the PRC. 
All control signals are synchronous with the system clock. 
HRESET_ causes the Controller to go to the IDLE state. The 
revised state output table (Figure 11) and the revised state 
diagram (Figure 12) give more details. 

Of significance are the wait states added to the state 
diagram of the behavioral model . These changes are boldface 
in the Revised Controller State Output Table. The changes 
were required by the Line Manager in which there is a 
significant propagation delay for the addresses. This delay 
is described in more detail in the Line Manager section of 
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this chapter and is a prime candidate for future work to 
improve this design of the PRC. The geostat information is 
shown in Figure 13 . 
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Figure 11. Revised Controller State Output Table. Changes 
highlighted . 
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Figure 12 . Revised Controller State Diagram. 
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Bounding Box: 

267.516 x 215.964 microns, 57773.825 square microns. 
10.532 x 8.503 mils, 89.550 square mils. 

Number of Pins = 26. 

Number of unique cells = 18. 

Number of Standard cells = 60 
Total Number of Instances = 60 

Total number of nets = 71. 

Total metall layer route length = 7073.14 microns. 

Total metal2 layer route length = 7073.46 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 14146.60 microns. 

Total number of vias = 226. 

Total number of segments = 1074. 

Reading transistor view . . . 

Total number of 460 transistors. 

0.195 Square mils per Transistor. 

5.137 Transistors per square mil. 

Power Dissipation = 3665.888 micro-watts. 



Figure 13. Controller Geostat Information. [Epoch output] 



C . SNOOPER 

This module watches the system bus activity and makes 
appropriate reports to the PRC Controller. 

If the transaction is a data-burst read or any kind of 
write and if the address parity is correct, then the read or 
write signal is asserted as appropriate. Also, the address is 
placed in the CAR. The snoop_ignore signal tells this unit to 
ignore the current transaction, because it was initiated by 
the Bus Interface Unit . The snoop_ignore signal must be 
asserted concurrently with the transfer attributes. Reads 
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that are not burst or data related are ignored by the PRC. 
The CAR is updated only on transactions relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC 
with respect to memory accesses, a second address tenure can 
occur shortly after the first, well before the first data 
tenure is complete. To compensate for this, the read and 
write outputs of the Snooper remain asserted until 
acknowledged by the Controller with hold. The rising edge of 
hold indicates that the read or write signal was received by 
the Controller. The Snooper then can negate these signals, 
but must leave CAR alone until hold is negated. After hold is 
negated, CAR can be updated to the new address. 

In Stage 0, the transfer attributes are latched in 
registers. Combinational logic determines if these transfer 
attributes represent a valid read or a valid write and if the 
address parity is correct. If the transaction is valid and 
one in which the PRC is interested, then Stage 0 raises a 
transaction_waiting signal. 

A Finite State Machine in Stage One sits in the IDLE 
state until it receives the transact ion_waiting signal. Then 
it latches the signals needed from Stage 0, resets the 
transaction_waiting signal and then waits for the hold signal 
to go low. A high hold signal indicates that the PRC is not 
done with the previous transaction. Once hold goes low, the 
read and write flags are set according to the type of the 
current transaction. Also, the input address is stored in the 
Current Address Register. The FSM then waits for the rising 
edge of hold before returning to the IDLE state where it can 
check if there is another transaction waiting. The geostat 
information is shown in Figure 14 . 
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Bounding Box: 

607.500 x 409.536 microns, 248793.127 square microns. 
23.917 x 16.123 mils, 385.630 square mils. 

Number of Pins = 88 . 

Number of unique cells = 19. 

Number of Standard cells = 169 
Total Number of Instances = 169 

Total number of nets = 219. 

Total metall layer route length = 28547.10 microns. 

Total metal2 layer route length = 14615.39 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 43162.49 microns. 

Total number of vias = 464. 

Total number of segments = 2268. 

Reading transistor view . . . 

Total number of 3608 transistors. 

0.107 Square mils per Transistor. 

9.356 Transistors per square mil. 

Power Dissipation = 26722.156 micro-watts. 



Figure 14. Snooper Geostat Information. [Epoch output] 



D . LINE MANAGER 



This structural model uses a high speed RAM ( hsram ) for 
the MRMA List. The CAR is stored into this RAM on a store or 
fetch_done signal. 

The predict ed_ma_li st is a register file for storing 
predicted memory addresses. This list is composed of 128 
address registers, 128 equality comparators and 128 Valid 
status flags. The NAR is stored in this list at the 
fetch_done pulse. If there is a match with the input address 
[in_addr ) , a priority encoder ( ENC_C ) determines which line 
matches . 
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The Line Replacement Unit determines the next line to be 
replaced whenever the PRC needs to start a new line. It first 
selects invalid lines. If all the lines are valid, then it 
selects lines that have been "aged." A priority encoder 
( ENC_1) chooses the line with the lowest index among all the 
lines that can be replaced. If all lines are valid, the 
output enable (oe) signal of the encoder is used to cause 
aging. A line X can be replaced if the following holds true 
for that line: 

not (X=ActiveLine) AND {not Valid [X] OR (all_lines_valid 
AND Aged [X] ) } 

Aging is accomplished by the use of a seven-bit counter 
(ager_counter) , initially set to zero. When the cause_aging 
signal from the encoder is high, the counter advances. A 
decoder ( DEC_B ) output causes the appropriate Aged flag to be 
set . 

Changing values of the CAR or NAR have a propagation 
delay of 25 ns (1.8 cycles) through the input address 
multiplexer ( in_addr mux ) . This required the addition of wait 
states in the Controller before each of the tests. The 
Revised Controller State Output Table and the Revised 
Controller State Diagram found in the Controller section of 
this chapter show the required changes. The geostat 
information is shown in Figure 15. 
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Bounding Box : 

6704.064 x 8897.364 microns, 59648499.103 square microns. 
263.940 x 350.290 mils, 92455.359 square mils. 

Number of Pins = 505. 

Number of unique cells = 22. 

Number of Standard cells = 123 
Number of Blocks = 1 
Number of Sub-Glues = 2 
Total Number of Instances = 126 



Total number of nets = 357. 

Total metall layer route length - 1017746.50 microns. 

Total metal2 layer route length = 463265.70 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 1481012.19 microns. 

Total number of vias = 2157. 

Total number of segments = 10524. 

Reading transistor view . . . 

Total number of 207467 transistors. 

0.446 Square mils per Transistor. 

2.244 Transistors per square mil. 

Power Dissipation = 1777694.500 micro-watts. 



Figure 15. Line Manager Geostat Information. [Epoch output] 



E . PREDICTOR 

The purpose of this module is to calculate the Next 
Address (stored in NAR) based on the Most Recent Memory Access 
(MRMA) and the Current Address (in the CAR) . The prediction 
calculation is 



NAR = 2 *CAR - MRMA 

In this structural implementation of the Predictor, the 
predict signal is the latch for the CAR and MRMA registers. 
The subtraction is accomplished as a two's compliment addition 
with a high speed adder. 
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The CAR is multiplied by two, an arithmetic shift left of 
one bit. The most significant bit of the CAR is not retained, 
as it will not have an effect on the 27-bit output of the 
adder. This will adversely affect address prediction only 
around the midpoint of the four gigabytes of memory. The 
applicable Golden Rule of computer design "is to make the 
common case fast: In making a design tradeoff, favor the 
frequent case over the infrequent case." (Hennessy, 1990) 

A number is negated in two's compliment by inverting all 
the bits and adding ' 1' . The MRMA is negated by inverting all 
its bits. Adding the required '1' is implemented as a 
Carry-In to the adder. 

The Epoch TACTIC tool reported the propagation delay from 
predict to NAR to be 4.90 ns. The geostat information is 
shown in Figure 16. 
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Bounding Box: 

261.900 x 895.824 microns, 234616.293 square microns. 
10.311 x 35.269 mils, 363.656 square mils. 

Number of Pins = 113 . 

Number of unique cells = 10 . 

Number of Blocks = 107 

Total Number of Instances = 107 

Total number of nets = 230. 

Total metall layer route length = 12158.68 microns. 

Total metal2 layer route length = 15209.06 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 27367.74 microns. 

Total number of vias = 392. 

Total number of segments = 1793. 

Reading transistor view . . . 

Total number of 3027 transistors . 

0.120 Square mils per Transistor. 

8.324 Transistors per square mil. 

Power Dissipation = 27722.887 micro-watts. 



Figure 16. Predictor Geostat Information. [Epoch output] 



F . DATA LIST 

This module stores the data retrieved from memory in 
anticipation of a request by the CPU. The basic memory cell 
is the Epoch part hsramoe (high speed ram with output enable) . 
Since each hsram has a maximum word size of 128 bits, there 
are two hsram parts in parallel to get the required 256-bit 
width . 

An upload signal causes the Data List to store the data 
on daca_line into the address specified by ActiveLine . The 
input upload has to be inverted to match the active-low WR 
input of the Epoch hsram component . A download signal causes 
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the Data List to assert onto data_line the data in the address 
specified by ActiveLine. This signal also has to be inverted 
for the same reason. 

Both the invertors can probably be removed if the Bus 
Interface Unit makes the upload and download signals active 
low. That could only improve the response time of the data 
memory . 

Epoch calculated the following timing delays: 

download -> hsramoe.DOUT 2.3 ns 

ActiveLine -> hsramoe.DOUT 7.3 ns 

A design alternative is to use the regular speed version, 
ramoe, which gives the following timing delays: 

download -> ramoe. DOUT 4 ns 

ActiveLine -> ramoe. DOUT 16 ns 

Using this slower RAM is possible, but would require a 
significant modification to the PRC behavior to handle the 
longer delay and would add a cycle delay to CPU reads when 
there is a hit in the PRC. 

Putting the VerilogOut file of this module into the 
original PRC behavioral model for mixed-mode simulation caused 
a timing error that had to be corrected in the Bus Interface 
Unit behavioral model. After an upload to the Data List, 
data_line must remain valid long enough to meet the data hold 
time requirement of the Epoch part hsramoe. The geostat 
information is shown in Figure 17. 
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Bounding Box: 

3834.792 x 3222.936 microns, 12359289.299 square microns. 
150.976 x 126.887 mils, 19156.938 square mils. 

Number of Pins = 282. 

Number of unique cells = 3. 

Number of Standard cells = 2 

Number of Blocks = 2 

Total Number of Instances = 4 



Total number of nets = 269. 

Total metall layer route length = 198805.54 microns. 

Total metal2 layer route length = 52952.76 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 251758.30 microns. 

Total number of vias = 728. 

Total number of segments = 2422. 

Reading transistor view . . . 

Total number of 214712 transistors. 

0.089 Square mils per Transistor. 

11.208 Transistors per square mil. 

Power Dissipation = 2181481.250 micro-watts. 



Figure 17. Data List Geostat Information [Epoch output] 



G . BUS INTERFACE 

This module connects the PRC with the system bus. It 
handles the protocol of data transfer in and out of the PRC. 

When this module receives a fetch signal, it latches the 
address in the NAR and requests the bus for a burst read. It 
stores the incoming data until all four bursts have been 
received. Then it uploads the data into the Data List and 
asserts fetch_done . If there is a parity error during the 
fetch, the Bus Interface informs the Controller by asserting 
fetch_abort. Also, the transaction is canceled. 
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When this module receives a send signal, it sends a 
cancel signal ( CANX ) to the memory module, downloads data from 
the Data List and then sends the data to the CPU. When the 
transfer is finished, it asserts send_done . 

The coordination of these activities is accomplished 
through the use of two Finite State Machines. One acts as an 
address bus master. The other controls the flow of data. The 
geostat information is shown in Figure 18. 



Bounding Box: -6264, -6408, 2246040, 1972980. 

2252.304 x 1979.388 microns, 4458183.285 square microns. 
88.673 x 77.929 mils, 6910.198 square mils. 

Number of Pins = 448. 

Number of unique cells = 56. 

Number of Standard cells = 1393 

Number of Sub-Glues = 1 

Total Number of Instances = 1394 

Total number of nets = 1843. 

Total metall layer route length = 676479.94 microns. 

Total metal2 layer route length = 469079.94 microns. 

Total metal3 layer route length = 0.00 microns. 

Total route length = 1145559.87 microns. 

Total number of vias = 9679. 

Total number of segments = 44298. 

Reading transistor view . . . 

Total number of 24403 transistors. 

0.283 Square mils per Transistor. 

3.531 Transistors per square mil. 

Power Dissipation = 237269.750 micro-watts. 



Figure 18. Bus Interface Geostat Information. [Epoch 
output ] 



H . TESTING 

The most significant large-scale test of the structural 
model is the Prediction Test, which is similar to the 
Prediction Test of the behavioral model . The test runs the 
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same series of CPU transactions to exercise all functional 
blocks of the PRC. The sequence of events for the Prediction 
Test is included in the sequencer4 . v file. 

The following steps are required to conduct a test: 

1. Change directories (cd) to the .. .veri log/hardware/ 
directory on the Computer Center (CC) system. 

2 . At the Unix command prompt , enter the command 
verilog -f verilog_arguments . 



The Verilog-XL output of the test is included in the 
appendices. This test shows that the structural model of the 
PRC performs the desired functions. The output of the 
structural model test is different from the output of the 
behavioral model test mainly because the new structural model 
does not contain the same display commands. These commands 
interfere with the Epoch compilation of the modules. Other 
display commands were added to the Testbench, which is still 
a behavioral model. The displays are sufficient to show that 
PRC performs as desired. 

While compiling the source files, Verilog-XL reports four 
warnings about implicit wires having no fanin. These wires 
are labeled NCO and NCI, deriving their initials from "not- 
connected." They are unused outputs on a couple of Epoch 
parts. Therefore, these warnings can be ignored. 

The section with comments about SDF Annotation is the 
result of incorporating the Epoch timing analysis into the 
Verilog model. Once that annotation is complete, the actual 
simulation begins. 

The error messages at the beginning of the simulation can 
be ignored. These error messages are generated by Epoch parts 
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and indicate improper signal values or timing. All these 
errors occur before the system hard reset and are expected. 
Having those errors after the system hard reset would have 
indicated a real problem. 

Once the system has reset, the CPU starts its series of 
transactions, beginning with reads from addresses OOh and 2 Oh. 
The comment "PRC requested the bus" indicates that the PRC is 
prefetching data. It appears that the prefetch occurs before 
the start of the second CPU transaction, but in reality it 
occurs just after the second CPU address tenure, which is not 
shown in the report. Also not shown because of the limitation 
of display commands with the PRC is the data prefetched by the 
PRC. That the data is correct can be seen .later in the 
report, when the PRC sends the data to the CPU. 

During the CPU to Memory transactions, there is 60 ns 
between each of the four beats of data. When the CPU reads 
from address 40h, the speed advantage of the PRC is 
demonstrated. Note that there is now only 15 ns between each 
beat. That is the period of the system clock and is therefore 
the maximum possible rate the CPU can receive data. 

The write to address ICOh occurred after the PRC had 
prefetched that data. The PRC should have flushed the 
prefetched data, because it was no longer valid. Later, when 
the CPU performs a read from the same address, it can be seen 
from the read data and from the timing (60 ns per beat) that 
the CPU is getting the data from main memory. In accordance 
with its design, the PRC did not try to give the stale data to 
the CPU. 
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V. 



CAD TOOLS 



The three primary design tools used in the development of 
this PRC were Verilog-XL, cWaves and Epoch. This chapter 
describes some of the particularly useful features of these 
tools and gives some tips for using these tools together. 

A. VERILOG-XL 

Verilog-XL allows the modeling of circuits in a 
programming language. Circuits can be modeled by behavior or 
structure. For the complex design of the PRC, it was 

convenient to start by dividing the design into six blocks and 
then using Verilog to model the behavior of each block. This 
allowed clarification of the required behaviors, deferring the 
search for hardware solutions until after the desired 
behaviors were well defined. 

Currently, Verilog-XL is available only on the Computer 
Center (CC) network. The following steps make it easier to 
use from an Electrical and Computer Engineering (ECE) 
workstation : 

1. Add the following line to the .cshrc file in the ECE 
account : alias rcc 'xhost in50204.cc.nps.navy.mil; 
rlogin -1 <username> in50204.cc.nps.navy.mil'. 

2. Re-source the session by typing "sc <return>" . 

3. Type “rcc <return>" to log into the CC account. 

4. Add the following line to the .cshrc file in the CC 
account: alias remote3 ' setenv DISPLAY 
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sun3 . ece .nps .navy .mil : 0 . 0 ' The . cshrc file can 
contain similar lines for other workstations. 

5 . Re-source as in Step 2 . 

Now the ECE workstation becomes the display for the CC 
workstation. Typing "filemgr &" will call up the CC file 
manager . 

Typing "verilog <return>" should give a list of options 
for use with Verilog-XL and will verify access to the program. 
One particularly useful option is to put all the arguments in 
a file, such as verilog_arguments and put the following line 
in the CC .cshrc file: 

alias veri 'verilog -f verilog_arguments ' 

Typing "veri" is much easier than listing the names of all the 
files that need to be included in the simulation. 

The Cadence online documentation can be accessed with the 
command "openbook The Main Menu is the starting point. 

The Alphabetical List on the bottom is the easiest way to find 
the desired information. In this list there is a Verilog-XL 
section which contains hyperlinks to the Verilog-XL Reference 
Manual and Tutorial . 

B . C WAVES 

This cool is indispensable for the analysis of 
complicated circuits. There is nothing like seeing a timing 
diagram to track down design errors . 

The database for the cWaves Viewer is created while 
running the Verilog simulation. The highest level Verilog 
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module should have the following two lines in an "initial" 
block : 

$shm_open; 

$shm_probe (<name>, "AS" ) ; 

where <name> is the instance name of the module to be 
observed. More information about these $shm commands can be 
found in the cWaves Reference Manual, which is a little 
difficult to find. It is in the Cadence Online Library 
accessed with "openbook & <return>" . Once the Main Menu 
appears, select the Alphabetical List on the bottom. The 
cWaves Reference Manual is filed under Composer (Schematic 
Entry), Design Framework II. Section 4 of this manual is 
particularly useful. 



C . EPOCH 

A circuit designer would find it very convenient if Epoch 
would take as input the raw behavioral models, but it does 
not . Each behavioral block must be converted into a 
structural model. Then, Epoch can automatically generate a 
Very Large Scale Integrated (VLSI) circuit layout using a rule 
set from a specific manufacturer. From the layout. Epoch 
performs a timing analysis of the circuit and generates a new 
Verilog file, which includes the timing information. This new 
file then can replace the behavioral model for resimulation 
with Verilog-XL. This allows the designer to verify each 
block as it is designed. CWaves can be used to track down 
timing errors. 
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Epoch is available on the ECE system. To access Epoch, 
add " /tools3/epoch/bin" to the "set path" command in the 
.cshrc file. Also, add "setenv CASCADE /tools3 /epoch" . 

The Epoch User's Tutorial and the Epoch Verilog Interface 
Reference are both very useful . The former is located at 
/tools3/epoch/data/examples/tutorial . The latter can be 
accessed through pull-down menus in Epoch: 

Help => On-Line Manual . . . 

Sometimes calling up this manual causes a FrameViewer error, 
but the manual does come up after a slight delay. 

The VerilogOut option proved very useful in the 
development of the PRC. With this option, Epoch creates a new 
Verilog file after laying out a design. The new model can be 
inserted in place of the old behavioral model for simulation 
with Verilog-XL. The Verilog Interface reference describes 
how this is done. In addition to the procedures described 
there, it will be necessary to take a few extra steps. 

1. If the files must be moved from the vout directory to 
another directory for simulation with Verilog-XL, 
correct the $sdf_annotate path in the .v file. 

2. In all the behavioral files, add a 'timescale 

directive like the one in the .v file generated by 
Epoch. This must appear before the "module" 
statement . 

3. It may be necessary to copy primelib.v from 

/tools3 /epoch/data/verilog into the CC directory. 

The PowerPC uses bit zero as the most significant bit of 

buses, so it was convenient to follow that convention in this 

PRC design. For example, the PowerPC address bus is 
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designated A [0:31] . Unfortunately, this causes a problem with 
the VerilogOut program, which reorders some of the indices and 
connects busses in reverse order. This problem seems to be 
unique to the VerilogOut file generation. The physical layout 
itself gets connected correctly regardless of the index 
numbering convention. Resolving this problem required 
renumbering the indices of all modules used for Epoch input so 
that the most significant bit had the highest index, such as 
A [ 3 1 : 0 ] . 
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VI: CONCLUSIONS AND RECOMMENDATIONS 



A. CONCLUSIONS 

In conclusion, the objective of this research has been 
met. This thesis presents a complete hardware design for the 
PRC. The simulation results show that the PRC can deliver 
data to the CPU at the rate of 8-1-1-1, that is eight cycles 
for the first beat and one cycle for each of the remaining 
three beats. This performance is better than the performance 
of main memory (8-3 -3 -3) . With a little more work on the 
design, the PRC should be able to deliver data at a rate of 4- 
1 - 1-1 . 

Aguilar proposed a design consisting of six modules which 
together would comprise the PRC. He took a bottom-up 
approach, designing four of those six modules, testing each 
independently, but not together. (Aguilar, 1995) As a result, 
the designs of these modules require modifications to enable 
them to function correctly together. Rather than redesigning 
the four modules, the approach taken during this research was 
top-down. That is, a single working behavioral model- was 
divided into six behavioral models that functioned together, 
and then each of the six behavioral models was converted into 
a hardware model. The result is still a six-module design, 
but the six modules of this design have different functions 
than the six modules of the design by Aguilar. The top-down 
approach worked exceedingly well to clarify the design and to 
minimize inter-module signal problems. 
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This research required a total of three academic 
quarters. The work during the first quarter primarily 
involved studying the problem, analyzing the design 
requirements, and learning about the PowerPC system. Two more 
quarters were required for the creation of the design, one 
quarter each for the behavioral design phase and the 
structural design phase. 

Epoch and Verilog-XL proved reliable and highly useful 
during the development of this hardware design. Verilog-XL 
performed the simulations necessary to verify the design. 
Epoch performed the VLSI circuit layout and timing analysis 
that were required by Verilog-XL in order to produce 
simulation results that could be considered accurate. 

Simulations with Verilog-XL are conveniently short while 
testing small modules. However, simulations of the entire PRC 
design typically ran for half an hour on a SUN SPARC-10 work 
station. Similarly, on small designs Epoch runs fast enough 
that a user could wait at the work station. To compile 
complex modules Epoch requires much more time. For example. 
Epoch takes over an hour to compile the Bus Interface of the 
PRC and more than three hours to compile the entire PRC. 

Both Verilog-XL and Epoch have functions and options 
which are not readily apparent . That problem is compounded by 
inadequate indexes in the user's manuals for each of these 
tools. On the other hand, the tutorials are very helpful for 
revealing some of those functions and options. 

Some of the options in Epoch require significant studying 
before use. The pull-down menus in Epoch could be better 
organized. Both of these characteristics work to make Epoch 
less user-friendly than it should be. 
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B. 



RECOMMENDATIONS 



As with any complex design, there is much more that a 
designer could do to improve this PRC. This section describes 
some areas of potential future research related to this 
hardware design. 

The first recommendation is to consider including the 
Arbiter on the PRC chip. This PRC design was developed for a 
PowerPC-603 microprocessor system, in which both the PRC and 
the CPU are candidates for bus mastership. This requires that 
there be a bus arbitration unit to prevent both devices from 
trying to use the bus simultaneously. The bus arbitration 
unit is a simple device whose function can be fulfilled with 
a single finite state machine (FSM) . It would be very easy to 
add this FSM to the PRC chip, eliminating the requirement to 
fabricate a separate integrated circuit chip. 

The second recommendation is in regards to improving the 
Line Manager design. The Line Manager is the block that 
requires the wait stares in the Controller State Diagram. The 
impact of these wait states is a delay of three cycles in 
determining if there is a hit within the PRC. Finding a way 
of eliminating these wait states could improve the speed at 
which the PRC delivers the first beat of data to the CPU and 
the speed at which the PRC prefetches data from main memory. 
Specifically, the performance would improve from 8-1-1-1 to 5- 
1-1-1. There is a strong chance that Epoch would prove useful 
in this endeavor. Epoch has timing analysis routines and can 
perform layouts in such a way as to minimize propagation 
delays for critical signals. Epoch also has automatic buffer 
sizing algorithms which could be used to ensure the output 
signals of each part are buffered sufficiently to drive their 
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loads. These capabilities of Epoch do require considerable 
CPU time. For example, running an automatic compilation on 
the current design of the Bus Interface Unit takes over an 
hour of actual CPU time on a Sun SPARC 10 workstation if the 
buffer sizing option is selected. 

The next recommendation is to study the rest of the 
design for critical paths. With Epoch as an analysis tool, it 
should be uncomplicated to analyze the entire PRC for critical 
timing paths. Some timing limitations may be improved through 
the buffer-sizing and timing-critical layout capabilities of 
Epoch. Other timing limitations may require modifying the 
design. The current PRC design includes only parts that were 
available in the Epoch library. It may be possible to design 
parts that outperform the Epoch parts . 

The final recommendation regards fabrication. If the PRC 
design detailed in this thesis is to be fabricated, it must 
undergo two steps. First, the power rails should be studied 
using Epoch to determine if there is a requirement for 
additional power and ground rails. Second, the design must be 
put inside a pad ring. Epoch may be able to create the pad 
ring automatically with minimal intervention by the designer. 
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APPENDIX A. LAYOUTS 



This appendix contains the VLSI (Very-Large-Scale- 
Integrated) circuit layouts for the PRC. These layouts were 
all generated by Epoch. 
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decreasing size, are the Bus Interface, 
Predictor, Snooper, and Controller. [Epoch 
output ] 
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Figure A2 . The PRC fully expanded. [Epoch output] 



59 










Figure A3 . 



The Cont --oiler . [Zpoch output] 
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Figure A4 . The Snooper. [Epoch output] 
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i I- 



PredMA.l istl (predicted_ma_l ist) 



Figure A5 . The Line Manager expanded one level. The bottom 
portion is shown in more detail in the next 
figure. [Epoch output] 



62 





63 





Figure A7 . The Line Manager fully expanded. [Epoch output] 
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Figure A8 . The Predictor fully expanded. [Epoch output] 
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Figure A9 . 



The Data List fuliy expanded. 



[Epoch output] 
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Figure A10. 



The Bus Interface 



fully expanded . 



[Epoch oucput. ] 
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Figure All. 



The Line Replacement Unit fully expanded, 
output ] 



[Epoch 
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Figure A12 . 



The Predicted Memory Address List fully 
expanded. [Epoch output] 
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Fig 1XR A 1 3 . 



Th<=> 1 2 8 -no- / Priority Rnnode^ fully 
[Epoch output] 




expanded . 
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i. igure A14 . 



The Predicted Address Register fully .xpanded. 
[Epoch output] 
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APPENDIX B. TESTBENCH VERILOG FILES 



This appendix contains the Verilog files for the 
Testbench. They are all behavioral models, used together to 
test the PRC design. The file are located on the Computer 
Center system at joshua_u2/jrrobert/thesis/verilog/behavior. 

A . TESTBENCH 



* TESTBENCH 

* Filename: testbench.v 

* Author: Joseph R. Robert, Jr. 

* Date: 24AUG95 

* Revised: 10JAN96 

* 

* Purpose: This module is the highest level in the design hierarchy. It 

* emulates a complete computer system, composed of 

* 1. cpu: a PowerPC-603 microprocessor. 

* 2. ram: random access memory. 

* 3. arbiter: the bus arbitration unit. 

* 4. prc: the predictive read cache under design. 

* 

* System configuration and features: 

* Single CPU 

* 64-bit data bus 

* No out-of-order split-bus transactions. 

* Synchronous interface: all I/O sampled on rising edge of bus clock. 

* 66 MHZ system clock, 66 MHZ CPU clock. 

* 

* Simulation should be done with a time unit = 1 ns. 

* 

module testbench; 

// Signal Declarations - conforms to PowerPC-603 notation 

// Address Arbitration 
wire CPUJBR_, //Bus Request 

CPU_BG_; //Bus Grant 

tril ABB_; //Address Bus Busy 
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tri 1 TS_; 



//Transfer Start (memory only, not I/O) 



// Address bus 

wire [0:31] A; //Address (note Motorola's reverse notation) 
wire [0:3] AP; //Address Parity 
wire APE_; //Address Parity Error 

// Transfer attributes 

wire [0:4] TT; //Transfer Type 

wire [0:2] TSIZ; //Transfer Size 

wire [0:1] TC; //Transfer Code 

tril TBST_; //Transfer burst 
wire GBL_, 

CI_, 

WT_, 

CSE; 

// Address Termination 

tri 1 A ACK_; //Address Acknowledge 

reg ARTRY_; //Address Retry 

// Data Arbitration 

wire CPU_DBG_; //Data Bus Grant 

reg DBWO_; //Data Bus Write Only 

tril DBB_; //Data Bus Busy 

// Data Transfer 

wire [0:63] D; //Data 

wire [0:7] DP; //Data Parity 

wire DPE_, //Data Parity Error 

DBDIS_; //Data Bus Disable 

// Data Termination 

tri 1 TA_; //Transfer Acknowledge 

reg DRTRY_; //Data Retry 

reg TEA_; //Transfer Error Acknowledge 

// System control 

reg HR£SET_; //Hard Reset 

wire PRC_BR_; //PRC Bus Request 

wire CANX; 

//Declare variables, constants, parameters 
parameter TRUE = Tbl, 

FALSE = 1'bO, 
hi = Tbl, 
low = 1'bO; 

//Initialize values, 
initial 
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begin 

DB WO_ = hi; //Limits CPU to in-order transactions. 

TEA_ = hi; //Only asserted for nonrecoverable bus error events. 
ARTRY_ = hi; //Retries used only with multiprocessor or multi- 

DRTRY_ = hi; // level memory systems. 

HRESET_= hi; 



end 



//define system clock, 66 MHz, T= 15 ns. 
reg elk; 
initial elk = 1; 
always 
begin 
#7 elk = 0; 

#8 elk = 1; 
end 



//Connect parts 

epu CPU1(CPU_BR_,CPU_BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_, 
CI_,WT_,CSE,AACK_,ARTRY_,CPU_DBG_,DBWO_,DBB_,D,DP, 
DPE_,DBDIS_,TA_,DRTRY_,TEA_,clk); 

memory MEM1(ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CI_,WT_,CSE,AACK_, 
DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_/TEA_,CANXxlk); 
arbiter ARB 1(CPU_BR_,CPU3G_,CPU_DBG_,PRC_BR_,PRC_BG_,PRC_DBG_. 
ABB_,DBB_,clk); 

pre PRC1(CPU3R_J>RC3R_,PRC3G_,AEB_,TS_,A,AP,APE_,TT,TSIZ,TC, 
TBST_.,AACK_,PRC_DBG_,DBB_,D,DP,DPE_,TA_,HRESET_,CANX,clk); 



//run simulation 
initial 
begin 

//$shm_open; 

#5 HRESET_ = low; //Reset entire system. 

#5 HRESET_ = in; 

//#4000; 

//$shm_probe(PRCl, n AS M ); 

#152000 Sfinish; //Adjust this time according to the instructions 
//in the sequencers, 
end 

endmodule 
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B. 



CPU 



j ************************** ****** C* *************************************** ****** 

* PowerPC-603 CPU 

* Filename: cpu.v 

* Author: Joseph R. Robert, Jr. 

* Date: 24AUG95 

* Revised: 10JAN96 

* 

* Purpose: This module emulates the PowerPC-603 microprocessor. Note that 

* most signals are active low. This makes it slightly more difficult to work 

* one's way through all the double negatives in this code's conditional 

* statements, but makes it much easier to correlate against the timing diagrams 

* in the PowerPC-603 User Manual. This model uses the same notations for 

* signals that connect to other modules. 

* This module uses the sequencer module to determine the operations the CPU 

* will perform. This model of the PowerPC-603 is capable of performing reads, 

* writes, burst reads, and burst writes. It handles bus arbitration just like 

* the '603 including the pipelined address tenures. Please refer to the 

* PowerPC-603 User Manual for a detailed description of the nature and timing 

* of each signal. 

* 

****************************************************************************** j 

module cpu (BR_,BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ.TC,TBST_,GBL_,CI_,WT_,CSE,AACK_, 
ARTRY_,DBG_,DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,DRTRY_,TEA_,clk); 

// Signals are defined in system.v. 

input BG_.AACK_,DBG_,DBWO_,DBDIS_,TA_,ARTRY_,DRTRY_,TEA_,clk; 

output BR_,APE_,CI_,WT_,CSE,DPE_; 

inout [0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

inout [0:4] TT; 

inout [0:3] AP; 

inout [0:2] TSIZ; 

inout [0:1] TC: 

inout ABB_,TS_,TBST_.GBL_,DBB_; 

reg BR_,APE_,CI_,WT_,CSE,DPE_; 

tri [0:31] A; 

tri [0:63] D; 

tri [0:7] DP; 

tri [0:4] TT; 

tri [0:3] AP; 

tri [0:2] TSIZ; 

tri [0:1] TC; 

tri ABB_,TS_,TBST_,GBL_,DBB_; 
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//declare variables, constants, parameters 
parameter TRUE = l’bl, 

FALSE = 11)0, 
hi =11)1, 
low = 11)0, 
trace = FALSE; 

//Address related 
wire [0:31] seq_addr; 
reg [0:31] addr_reg, address[0:l]; 
reg [0:31] a_reg; 
assign A = a_reg; 

reg [0:3] ap_reg, addr_parity_in, addr_parity_calc; 
assign AP = ap_reg; 

//Data related 

reg [0:63] data [0:1]; 

wire [0:63] seq_data; 

reg [0:63] d_reg, load_data, data_reg; 

assign D = d_reg; 
reg [0:255] line_reg, line [0:1]; 
wire [0:255] seqjine; 

reg [0:7] dp_reg, d_parity_in 7 d_parity_calc; 
assign DP = dp_reg; 

//Other external control signals 
reg Transfer_start [0:1]; 
reg abb_reg_, dbb_reg_, ts_reg_, tbst_reg_; 
assign ABB_ = abb_reg_; 
assign TS_ =ts_reg_; 
assign DBB_ = dbb_reg_; 
assign TBST_ = tbst_reg_; 

reg [0:4] Transfer_type [0:1]; 
wire [0:4] seq_T r ans f er_t ype ; 
reg [0:4] tt_reg; 
assign TT = tt_reg; 
parameter //for Transfer_type 
none = 5’bz, 

write = 5’bOOOlO, // 02 

write_atomic = 5’b 10010, //12 
read =51)01010, //0A 

read_atomic = 5'b 11010, //1A 
burst_write = 5'b00 110, //06 

burst_read =5*601110, //0E 

burst_read_atomic = 5’bl 1 1 10; //IE 

reg [0:2] Transfers ize [0:1]; 
wire [0:2] seq_Transfer_size; 
reg [0:2] tsiz_reg; 
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assign TSIZ = tsiz_reg; 

reg [0:1] Transfer_code [0:1]; 
wire [0:1] seq_Transfer_code; 
reg [0:l]tc_reg; 
assign TC = tc_reg; 
parameter //for Transfer_code 
data_transfer = 2’b00, 
touch_load = 2'b01, 
instruction_fetch = 2'blO, 
reserved =2'bll; 



//Other internal control signals 

reg need_bus_; 

wire need_bus_trigger_; 

reg AB_MastenDB_Master, Addr_termination; 

wire qual_BG_>qual_DBG_; 

reg [0:7] index; 

wire parked; 

wire pp; 

reg dpp; 

event transfer_acknowledged; 



//initialize signals 
initial 
begin 

a_reg <= 32’bz; 
ap_reg <= 4'bz; 
addr_parity_in <- 4’bz; 
addr_parity_calc <= 4'bz; 
addr_reg <= 32'bz; 
address[0] <= 32'bz; 
address[l] <= 32'bz; 

data[0] <= 64'bz; 
data[l] <= 64'bz; 
d_reg <= 64'bz; 
line[0] <= 256'bz; 
line[l] <= 256'bz; 
line_reg <= 256'bz; 
d_panty_in <= 8’bz; 
d_parity_calc <= 8'bz; 
dp__reg <= 8'bz; 



APE_ <= *bz; 
BR_ <= hi; 
CI_ <= hi; 
CSE <= low; 
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DPE_ <= ’bz; 

WT_ <= hi; 
abb_reg_ <=’bz; 
dbb_reg_ <= 'bz; 
ts_reg_ <=*bz; 
tbst_reg_ <= 'bz; 
Transfer_type[0] <= none; 
Transfer_type[l] <= none; 
tt_reg <= none; 
Transfer_size[0] <= 0; 
Transfer_size[l] <= 0; 
tsiz_reg <= ’bz; 
Transfer_code[0] <= reserved; 
Transfer_code[l] <= reserved; 
tc_reg <= 2’bz; 
Transfer_start[0] <= FALSE; 
Transfer_start[l] <= FALSE; 

AB_Master <= FALSE; 
DB_Master <= FALSE; 
Addr_termination <= FALSE; 
need_bus_ <- hi; 
dpp <= 0; 
end 



// 

sequencer SEQ l(seq_Transfer_size,clk,pp,seq_addr,seq_data,seq_line, 

seq_Transfer_type,seq_Transfer_code,need_bus_trigger_,ABB_ ); 

always @(negedge need_biis_trigger_) 
begin 

address[pp] <= seq_addr; 
datafpp] <= seq_data; 
line[pp] <= seqjine; 

Transfer_type[pp] <= seq_Transfer_type; 

Transfer_size[pp] <= seq_Transfer_size; 

Transfer_code[pp] <= seq_Transfer_code; 
end 



// 

//ADDRESS BUS TENURE 

// *** 1. Address bus arbitration 

always @(negedge need_bus_triggeO 
need_bus_ = low; 

//Parked means that the CPU can take the bus as soon as it needs it. 
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assign parked = (!BG_& ABB_ & ARTRYJ; 

//If CPU needs bus, it needs to assert BR_ only if not parked, 
always @(posedge elk) 
if (BR_ = hi) 

BR_ = #7 ~(need_bus_=low & parked=FALSE); 

assign qual_BG_ = ~(need_bus_=low & parked=TRUE); 

//Assume mastership 
always @(posedge elk) 
if (qual_BG_ == low) 
begin 

abb_reg_ - #7 low; 

AB_Master = TRUE; 

BR_ <=#1 hi; 
need_bus_ <= #2 hi; 
end 

// *** 2. Address Transfer 

always @(posedge elk) 
if (qual_BG_ = low) 
begin 

addr_reg = address[pp]; 
addr_parity_calc[0] <= ~ A addr_reg[0:7]; 
addr_parity_calc[l] ~ A addr_reg[8:15]; 
addr_parity_calc[2] <= ~ A addr_reg[ 16:23]; 
addr_parity_calc[3] <= ~ A addr_reg[24:31]; 
ts_reg_ = #7 low; 

Transfer_start[pp] <= TRUE; 
a_reg <= address[pp]; 
ap_reg <= addr_parity_calc; 
tt_reg <= Transfer_type[pp]; 
tsiz_reg <= Transfer_size[pp]; 
tc_reg <= Transfer_code[pp]; 
if (Transfer_type[pp] = burst_read 
II Transfer_type[pp] == burst_write) 
tbst_reg_ <= low; 

//insert other address transfer characteristics here, 
end 

always @(posedge elk) 
if (AB_Master & TS_=low) 
begin 

ts_reg_ = #7 hi; 
wait (AACK_=low); 

Addrjermination = TRUE; 
end 
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always @(posedge elk) 
if (Addr_termination) 
begin 

#7 ts_reg_ <= ’bz; 
a_reg <= ’bz; 
ap_reg <= ’bz; 
tt__reg <= ’bz; 
tc_reg <= ’bz; 
tsiz_reg <= 'bz; 
tbst_reg_ <= ’bz; 

//insert other addr transfer characteristics here. 
abb_reg_ <= #2 hi; 
abb_reg_ <= #8 'bz; 

AB_Ma$ter = FALSE; 

Addr_termination = FALSE; 
end 

// 

//DATA BUS TENURE 

assign qual_DBG_ = ~(!DBG_ & DBB_ & DRTRYJ; 

always @(posedge elk) 
begin 

if (TA_== low) 

-> transfer_acknowledged; 
end 

always 

begin 

#2 dpp = ~dpp; 
case(Transfer_type[dpp]) 
none: begin end 

//Note: TS is an implied data bus request. CPU can assume mastership if it 
//has a qualified data bus grant. 

read: begin 

//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_==iow & Transfers tart [dpp]); 

@(posedge elk) //assume data bus mastership 
dbb_reg_ <= #7 low; 

@(transfer_acknowledged) //latch data and terminate read 
data[dpp] <= D; 
data__reg <= D; 
d_parity_in <= DP; 

Transfer_type[dpp] <= none; 

Transfer_code[dpp] <= reserved; 

Transfer_start[dpp] = FALSE; 
d_parity_calc[0] <= -^ata.regfO^]; 
d_parity_calc[l] <= - A data_reg[8:15]; 
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d_parity_c a lc[ 2 ] <= ~Mata_reg [16:23]; 
d_parity_calc[3] <= ~ M at a_reg [24 : 3 1 ] ; 
d_parity_calc[4] <= ~ A data_reg[32:39]; 
d_parity_calc[5] <= ~Aiata_reg [40:47]; 
d_parity_calc[6] <= ~ A data_reg[48:55]; 
d_parity_calc[7] <= ~ A data_reg[56:61]; 
if (trace) begin 

$display("CPU read %h from address %h.", 
data[dpp] , address [dpp] ) ; 

$display(" Completed at time %d",$time); 
end 

dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 
if (d_parity_in != d_parity_calc) 
begin 

$display("CPU: data parity error."); 

$display(" Calculated parity: %b", 
d_parity_calc); 

$display(" Recevied parity: %b", 
d_parity_in); 
end 
end 

write: begin 

data_reg = datafdpp]; 
d_parity_calc[0] <= ~ A data_reg[0:7]; 
d_parity_calc[l] <= ~ A data_reg[8:15]; 
d_parity_calc[2] <= -Mata_reg[ 16:23]; 
d_parity_calc[3] <= ~ A data_reg [24:31]; 
d_parity_calc[4] <= ~ A data_reg [32:39]; 
d_parity_calc[5] <= -Alata_reg [40:47]; 
d_parity_calc[6] <= -^ata.reg^S^]; 
d_parity_calc[7] <= -^ata.reg [56:61]; 

//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_=low & Transfer_start[dpp]); 
@(posedge elk) //assume data bus mastership 
dbb_reg__ = #7 low; 
d_reg <= data[dpp]; 
dp_reg <= d_parity_calc; 
@(transfer_acknowledged) //terminate write 
d_reg <= #7 64 ’bz; 
dp_reg <= #7 8’bz; 

Transfer_type[dpp] <= none; 

Transfer_start[dpp] = FALSE; 
if (trace) begin 

$display("CPU wrote %h to address %h.’\ 
data[dpp],address[dpp]); 

$display(" Completed at time %d",$time); 
end 

dbb_reg__ = #4 hi; 



82 



dbb _reg_ = #8 *bz; 
end 

burst_read: begin 

//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_= low & Transfer_start[dpp]); 
@(posedge elk) //assume data bus mastership 
dbb_reg_ <= #7 low; 

if (trace) 

SdisplayC’CPU started read from address %h at time %d. M , 
address[dpp],$time); 
repeat (4) begin 

@(transfer_acknowledged) //latch beat 
datafdpp] <= D; 
data_reg <= D; 
d_parity_in = DP; 

#1 if (trace) 

SdisplayC CPU read: %h at %d'\data[dpp],$time); 
d_panty_calc[0] <= ~ A data_reg[0:7]; 
d_parity_calc[l] <= — ^data_reg[8: 15]; 
d_parity_calc[2] <= -Adata.regf 16:23]; 
d_parity_calc[3] <= ~Mata_reg [24:31]; 
d_parity_calc[4] <= ~ A data_reg[32:39]; 
d_parity_calc[5] <= ~ A data_reg [40 :47] ; 
d_parity_calc[6] <= ~ A data_reg[48:55]; 
d_parity_calc[7] = ~ A data_reg[56:61]; 

#2 if (d_parity_in != d_parity_calc) 
begin 

SdisplayC'CPU: data parity error.”); 

SdisplayC Calculated parity: %b", 
d_parity_calc); 

$display(" Recevied parity: %b", 
d_parity_in); 
end 
end 

Transfer_type[dpp] <= none; 

Transfer_code[dpp] <= reserved; 

Transfer_start[dpp] <= FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 
end 

burst_write: begin 

//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_=low & Transfer_start[dpp]); 
if (trace) 

SdisplayC’CPU started write to address %h at time %d.’\ 
address[dpp],$time); 
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@(posedge elk) //assume data bus mastership 
dbb_reg_= #6 low; 
line_reg = line[dpp]; 
data_reg = line_reg[0:63]; 
d_parity_calc[0] <= -Mata.regfO^]; 
d_parity_calc[l] <= ~ A data_reg[8:15]; 
d_parity_calc[2] <= -Aiata.reg [16:23]; 
d_parity_calc[3] <= -Mata^reg [24:31]; 
d_parity_calc[4] <= -Alata^eg [32:39]; 
d_parity_calc[5] <= -Alata.reg [40:47]; 
d_parity_calc[6] <= ~ A data_reg[48:55]; 
d_parity_calc[7] = -Alata_reg[56:61]; 
dp_reg <= d_parity_calc; 
d_reg = line_reg[0:63]; 

#1 if (trace) 

SdisplayC CPU write beat 1: %h at %d”,d_reg,$time); 
@(transfer_acknowledged); //first beat done 

data_reg = line_reg[64:127]; 
d_parity_calc[0] <= ~ A data_reg[0:7]; 
d_parity_calc[l] <= ~ A data_reg[8:15]; 
d_parity_calc[2] <= -Mata.regt 16:23]; 
d_parity_calc[3] <= ~ A data_reg[24:31]; 
d_parity_calc[4] <= -Aiata^reg [32:39]; 
d_parity_calc[5] <= ^Mata^eg [40:47]; 
d_parity_calc[6] <= -Alata.j'eg [48:55]; 
d_parity_calc[7] = -Mata^eg [56:61]; 
dp_reg <= d_parity_calc; 

#7 d_reg = line_reg[64:127]; 

#1 if (trace) 

SdisplayC' CPU write beat 2: %h at %d'\d_reg,$time); 
@(transfer_acknowledged); //second beat done 

data_reg = line_reg[128: 191]; 
d_parity_calc[0] <= -Mata.regfO’J]; 
d_parity_calc[l] <= ~ A data_reg[8:15]; 
d_parity_calc[2] <= ~ A data_reg[ 16:23]; 
d_parity_calc[3] <= -Mata^eg [24:31]; 
d_parity_calc[4] <= ~ A data_reg[32:39]; 
d_parity_calc[5] <= ~ A data_reg [40:47]; 
d_parity_calc[6] <= ~ A data_reg[48:55]; 
d_parity_calc[7] = -Mata.reg [56:61]; 
dp_reg <= d_parity_calc; 

#7 d_reg= line_reg[128:191]; 

#1 if (trace) 

SdisplayC CPU write beat 3: %h at %d'\d_reg,$time); 
@(transfer_acknowledged); //third beat done 

data_reg = line_reg[191:255]; 
d_parity_calc[0] <= '- A data_reg[0:7]; 
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d_parity_calc[l] <= ~ A data_reg[8:15]; 
d_parity_calc[2] <= -Mata^regf 16:23]; 
d_parity_calc[3] <= ~ A data_reg[24:31]; 
d_parity_calc[4] <= ~ A data_reg[32:39]; 
d_parity_calc[5] <= -AJata.regKO^]; 
d_parity_calc[6] <= ~ A data_reg[48:55]; 
d_parity_caic[7] = ~ A data_reg[56:61]; 
dp_reg <= d_parity_calc; 

#7 d_reg= line_reg[ 192:255]; 

#1 if (trace) 

$display( M CPU write beat 4: %h at %d",d_reg,$time); 
@(transfer_acknowledged); //fourth beat done 
d_reg <= #7 64’bz; 
dp_reg <= #7 8’bz; 
line_reg <= #7 256’bz; 

Transfer_type[dpp] <= #7 none; 

Transfer_code[dpp] <= #7 reserved: 

Transfer_start[dpp] <= #7 FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 
end 

default: SdisplayC’CPU module has bad TT[%b] = %b”,dpp, 
Transfer_type[dpp], M at time %d.’\$time); 

endcase 

end 

endmodule 



C. ARBITER 



j * * * * * * jfc if: * * * if: if: * * * * 

* BUS ARBITRATION UNIT 

* Filename: arbiter.v 

* Author: Joseph R. Robert, Jr. 

* Date: 24AUG95 

* Revised: 10JAN96 

* 

* Purpose: This module emulates the system’s external bus arbitration unit. 

* It is implemented as a Finite State Machine. 

* There are only two possible bus masters in this system: the CPU and the PRC. 

* Also, the address bus and data bus are each arbitrated for independently, 

* though the data bus arbitration occurs after the corresponding address bus 

* arbitration. 

* If a unit wants the address bus, it asserts BR_. If the bus is available, 

* the aribter asserts BG_ back to that unit, which can then take mastership by 

* asserting ABB_. When it is done with the address bus, it negates ABB_. 
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* It is assumed that if a unit wanted the address bus, it will also want the 

* data bus. "Address only” transactions will not occur in this system, since 

* there is no external cache or multiprocessors. Therefore, after asserting 

* BG_ to the requesting unit, the arbiter asserts DBG_ on the next cycle. 

* BG_ and DBG_ are both asserted until the requesting unit takes mastership, 

* unless the requesting unit withdraws its request by negating BR_. 

* If there are no pending bus requests, the arbiter "parks” the CPU by 

* granting it the busses. This reduces memory access time for the CPU. If the 

* CPU is parked, and then the PRC requests the bus. the CPU is unparked, and 

* the arbiter can then grant the bus to the PRC. 

* The PowerPC can conduct a second address tenure long before the first data 

* tenure is complete. This pipelining has a maximum depth of two transactions, 

* meaning that a third address tenure will not start before the first data 

* tenure is complete. The Memory Unit in this Testbench is capable of handling 

* that situation. However, adding the PRC to the system creates the 

* possibility that the PRC will inidate a third address tenure before the 

* first of two CPU transacdons is complete. This situadon is handle by this 

* Arbiter which keeps track of the pipelining depth. It will not grant the 

* address bus to any unit if that address tenure would put a third transacdon 

* in the pipeline. Rather, the arbiter will stall unul the data tenure from 

* the first transacdon is complete, and then will grant the address bus to the 

* requesting unit. 

* 

module arbiter (CPU_BR_,CPU_BG„,CPU_DBG_,PRC_BR_^RC_BG_,PRC_DBG_, 
ABB_,DBB_,clk); 

output CPU_BG_, CPU_DBG_, PRC_BG_, PRC_DBG_; 
input CPU_BR_, PRC_BR_, ABB_, DBB_, elk; 
reg CPU_BG_,CPU_DBG_, PRC_BG_, PRC_DBG_; 
wire CPU_BR_, PRC_BR_, elk; 

//Declare variables, constants, parameters 
parameter TRUE = l'bl, 

FALSE = l^O, 
hi = l'bl, 
low = TbO; 

reg [1:0] requests; //concatenated input signals 
reg [1:0] depth; 
tri stall; 

//Finite State Machine variables and parameters 
reg [2:0] state, next_state; 
parameter start = 1, 
grant_cpu_a = 2, 
park__cpu = 3, 
grant_cpu_d = 4, 
grant_prc_a = 5, 
wait_for_prc = 6, 
grant_prc_d = 7; 
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//Initialize outputs 
initial 
begin 

CPU_BG_ <= hi; 

CPUJDBG_ <= hi; 

PRC_BG_ <= hi; 

PRC_DBG_ <= hi; 
state <= stan; 
next_state <= stan; 
requests <= 'bl 1; 
depth <= 0; 
end 

//Track depth of pipeline 
always @(posedge ABB_J) 
begin 

depth = depth + 1; 
end 

always @(posedge DBBJ) 
begin 

depth = depth - 1; 
end 

assign stall = (depth > 1); 

// 

//Arbitration 

always 

begin 

wait (Istall); 

#5 state = next_state; 

#1 case (state) 
stan: //I 
begin 

CPU_BG_ <= hi; 

CPU_DBG_ <= hi; 

PRC_BG_ <= hi; 

PRC_DBG_ <= hi; 

@(posedge elk) requests = {CPULBR^RCLBR-}; 
case (requests) 

2'b00: next_state = grant_cpu_a; 

2'b01: next_state = grant_cpu_a; 

2’blO: next_state = grant_prc_a; 

2’bl 1: next_state = grant_cpu_a; 
endcase 
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end 



grant_cpu_a: //2 
begin 

CPU_BG_ <= low; 

CPU_DBG_ <= hi; 

PRC_BG_ <= hi; 

PRCJDBG_<= hi; 

@(posedge elk); 
next_state = park_cpu; 
end 

park_cpu: //3 
begin 

CPU_BG_ <= low; 

CPU_DBG_<= low; 

PRC_BG_ <= hi; 

PRC_DBG_ <= hi; 

@(posedge elk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 

2’b00: next_state = park_cpu; 

2'b01: next_state = park_cpu; 

2’blO: next_state = grant_cpu_d; 

2’bl 1: next_state = park_cpu; 
endcase 
end 

grant_cpu_d: //4 
begin 

CPU_BG_ <= hi; 

CPU_DBG_ <= low; 

PRC_BG_ <= hi; 

PRC_DBG_ <= hi; 

@(posedge elk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 

2'b00: next_state = park_cpu; 

2'b01: next_state = park_cpu; 

2’blO: next_state = grant_prc_a; 

2’bl 1: next_state = park_cpu; 
endcase 
end 

grant_prc_a: //5 
begin 

CPU_BG_ <= hi; 

CPU_DBG_ <= hi; 

PRC_BG_ <= low; 

PRC_DBG_ <= hi; 

@(posedge elk); 
next_state = wait_for_prc; 
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end 



wait_for_prc: //6 
begin 

CPU_BG_ <= hi; 

CPU_DBG_ <= hi; 

PRC_BG_ <= low; 

PRC_DBG_ <= low; 

@(posedge elk) requests = {CPU_BR_,PRC_BR_}; 
case (requests) 

2’b00: next_state = wait_for_prc; 

2'b01: next_state = grant_cpu_d; 

2'blO: next_state = wait_for_prc; 

2'bl 1: next_state = grant_prc_d; 
endcase 
end 

grant_prc_d: //7 
begin 

CPU_BG_ <= hi; 

CPU_DBG_ <= hi; 

PRC_BG_ <= hi; 

PRC_DBG_ <= low; 
wait (DBB_ = hi); 

@(posedge elk) requests = {CPU_BR_vPRC_BR_}; 
case (requests) 

2’b00: next_state = grant_cpu_a; 

2’b01: next_state = grant_cpu_a; 

2'blO: next_state = grant_prc_a; 

2'bl 1: next_state = grant_cpu_a; 
endcase 
end 

default: $display("state error in module arbiter"); 
endcase 



endmodule 



D . MEMORY 



* RANDOM ACCESS MEMORY 

* Filename: memory.v 

* Author: Joseph R. Robert, Jr. 



89 



* Date: 24AUG95 

* Revised: 10JAN96 

* 

* Purpose: This module emulates the system s main memory. For simulation 

* efficiency, the memory has only enough physical address space for four burst 

* reads. Thus, 128 bytes. The address bus width allows a virtual address space 

* of 4 G-bytes. Accesses to addresses past the first 128 bytes map to within 

* die first 128 bytes. 

* The time required for memory accesses are determined by Delay 1 and 

* Delay2. Delay 1 is the delay, in cycle, required for the initial access. 

* Delay2 is the delay required for each successive beat of four-beat 

* operations. Set them both to 0 for fastest memory response. Set them to 8 

* and 3 respectively for realistic memory response of a 60 ns DRAM. Do not set 

* Delay2 > Delayl. That will not represent a realistic memory response, and 

* will probably cause this module to act weird. 

* There is a two-stage pipeline involved with memory accesses, such that a 

* memory tenure can be started while the previous data tenure is still active. 

* To accomplish this, some signals have [0:1] in their declaration, and are 

* indexed using pp and dpp, which are the address pipeline position pointer, 

* and the data pipeline position pointer, respectively. 

* To keep this model simple, a single-beat read will always return a 

* single byte of data, regardless of TSIZ, in byte lane 0, which is different 

* from the way the PowerPC really operates. See Table 10-4 on pg. 10-15 of 

* the PowerPC-603 Users Manual for actual alignment. This simplification is 

* irrelevant to the performance of the PRC which deals only with burst 

* operations. 

* It is important to note that- this memory module had to have one feature 

* that is not typical of memory modules. It has a CANX input with cancels the 

* current read operauon. It is through this signal that the PRC stops the 

* memory module from delivering data to the CPU when the PRC already has tile 

* data. 

* 

module memory (ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_,GBL_,CL,WT_,CSE,AACK„, 
DBWO_,DBB_,D,DP,DPE_,DBDIS_,TA_,TEA_, CANX, elk); 

//Signals are defined in system.v. 
output AACK_,DBDIS_,TA_,APE_; 
input [0:1] TC; 

input DBWO_,CI_,WT_,CSE,TEA_,DPE_,CANX,clk; 

input [0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

input [0:4] TT; 

inout [0:3] AP; 

inout [0:2] TSIZ; 

inout ABB_,TS_,TBST_,GBL_,DBB_; 
wire [0:31] A; 

wire C I_, WT_,CSE,TEA_,DPE_, ARTR Y_; 
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reg AACK_,APE_,DBDIS_,DRTRY_; 

tri [0:63] D; 

tri [0:7] DP; 

tri [0:3] AP; 

tri [0:2] TSIZ; 

tri ABB_,TS_,TBST_,GBL_,DBB_,TA_; 
reg [0:63] d_reg, data; 
assign D = d_reg; 

// 

//Declare variables, constants, parameters 
parameter TRUE = l’bl, 

FALSE = 1’bO, 
hi = l’bl, 
low = 1'bO, 

Size = 128, //Size of memory in bytes. 

Length = 7, //Length of physical address in bits. 
Delay 1 = 8, //Delay for address translation. 
Delay2 = 3; //Delay between successive beats. 

parameter //for Transfer_type 
none = 5’bzzzzz, 

write = 5’bOOOlO, 

write_atomic = 5’bl0010, 
read = 5’bOlOlO, 

read_atomic = 5’bl 1010. 
burst_write = 5'b001 10, 
burst_read = 5’b01 110, 
burst_read_atomic = 5’bl 1110; 

reg [0:31] virtual_addr, index; 
reg [0:3] addr_parity_calc,addr_parity_in; 
reg [0:Length-l] pa_reg, physical_addr [0: 1]; 
reg [0:7] 
mem [0:Size-l], 

mem_reg; //Memory data register 
reg [0:4] Transfer_type [0:1]; 
reg [0:2] Transfer_$ize [0:1]; 
reg burst [0:1]; 
reg [0:1] i, burst_start; 

reg pp,dpp; //current pipeline and data pipeline positions 
reg abort; 
reg ta_reg_; 
assign TA_ = ta_reg_; 

//Initialize memory 
initial 
begin 

abort <= FALSE; 

AACK_ <= hi; 
addr_parity_calc <= 3’bz; 
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addr_parity_in <= 3'bz; 

DBDIS_ <= hi; 
ta_reg_ <= ’bz; 
d_reg <= 64’bz; 

Transfer_type[0] <= none; 

Transfer_type[l] <= none; 

Transfer_size[0] <= ’bz; 

Transfer_size[l] <= ’bz; 
burstfO] <= bz; 
burst[l] <= ’bz; 
pp <= l’bl; 
dpp <= l'bl; 

for (index = 0; index<Size; index=index+l) 
mem[index] = index; 
end 



// 

//ADDRESS TENURE 
always @(posedge elk) 
begin 

if (ABB_ = low) 
begin 

//latch address and attributes 

pp = ~pp; 

Transfer_type[pp] <= TT; 

Transfer_size[pp] <= TSIZ; 
burst [pp] <= TBST_; 

/Ansert other attributes here. 
addr_parity_in <= AP; 
virtual_addr = A; 

addr_parity_calc[0] <= ~ A virtual_addr[0:7]; 
addr_parity_calc[l] <= ~ A virtual_addr[8:15]; 
addr_parity_calc[2] <= - A virtual_addr[ 16:23]; 
addr_parity_calc[3] <= ~ A vhtual_addr[24:31]; 
physical_addr[pp] = virtual_addr[32-Length:31]; 
if (addr_parity_in != addr__parity_calc) 
begin 

$di splay ("Memory: address parity error.”); 

$display(” Calculated parity: %b”,addr_parity_calc); 
Sdisplay(” Recevied parity: %b",addr_parity_in); 
end 

AACK_ = #7 low; 
wait (AACK_=hi); 
end 
end 

always @(posedge elk) 
begin 

if (AACK_= low) 

AACK_ = #7 In; 
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end 



//DATA TENURE 



always @(posedge elk) 
begin 

if (CANX = hi) 
abort = TRUE; 



end 



id ways 
begin 

# 1 dpp = ~dpp; 

#1 case (Transfer_type[dpp]) 
none: begin end 

read: 

begin 

repeat(Delayl)@(posedge elk); 

#7 ta_reg_ <= low; 

d_reg[0:7] <= mem[physical_addr[dpp]]; 
Transfer_size[dpp] <= 'bz; 

@(posedge elk) 

Transfer_type[dpp] <= none; 

#7 ta_reg_ = ’bz; 
d_reg[0:7] <= ’bz; 
end 

write: 

begin 

repeat(Delayl)@(posedge elk); 

#7 ta_reg_ <= low; 

@(posedge elk) 

//latch data 
data = D; 

mem[physical_addr[dpp]] <= data[0:7]; 
#7 ta_reg_ = ’bz; 

Transfer_size[dpp] <= ’bz; 
Transfer_type[dpp] <= none; 
end 

burst_read: 

begin 

//find critical double-word 
#2 pa_reg = physical_addr[dpp]; 
burst_start = pa_reg[Length-5:Length-4]; 
//align to cache line 

pa_reg[Length-5:Lengtli-l] = 5’bOOOOO; 

physical_addr[dpp] = pa_reg; 

if (! abort) if (Delay 1-Delay2-1 >= 0) 
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repeat( Delay 1 -Delay 2-1 )@(posedge elk); 

for (index=0; index<4; index=index+l) 
begin 

if (! abort) repeat(Delay2)@(posedge elk); 
if (Delay 1-Delay2!=0 11 index!=0) @(posedge elk); 
if (! abort) begin 
#7 ta_reg_ <= low; 
i = burst_start+index; //i is mod 4 
d_regf 0: 7]<=mem[physical_addr[dpp]+8*i]; 
d_reg[ 8:1 5]<=mem[physical_addr[dpp]+8*i+ 1 ]; 
d_reg[16:23]<=mem[physical_addr[dpp]+8*i+2]; 
d_reg[24:31]<=mem[physical_addr[dpp]+8*i+3]; 
d_reg[32:39]<=mem[physical_addr[dpp]+8*i+4]; 
d_reg[40:47]<=mem[physical_addr[dpp]+8*i+5]; 
d_reg[48:55]<=mem[physical_addr[dpp]+8*i+6]; 
d_reg[56:63]<=mem[physical_addr[dpp]+8*i+7]; 
if (Delay2!=0) 
begin 

ta_reg_<= #13 ’bz; 
d_reg <= #13 64'bz; 
end 
end 
else 

index <= 5; 
end 

@(posedge elk) 
ta_reg_ <= #7 ’bz; 
d_reg <= #7 64’bz; 

Transfer_size[dpp] <= 'bz; 

Transfers ype[dpp] <= none: 
abort <= FALSE; 
end 

bnrst_write: 

begin 

//burst-writes are always performed in order 
if (Delay 1-Delay2 >= 0) 
repeat(Delayl-Delay2)@(posedge elk); 
for (index=0; index<4; index=index+l) 
begin 

repeat(Delay2)@(posedge elk); 

#7 ta_reg_ <= low; 
i = index; 

@(posedge elk) //latch data 
data = D; 

mem[physical_addr[dpp]+8*il <= dataf 0: 7]; 
mem[physical_addr[dpp]+8*i+l] <= data[ 8:15]; 
mem[physical_addr[dpp]+8*i+2] <= dataf 1 6:23]; 
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mem[physical_addr[dpp]+8*i+3] <= data[24:31]; 
mem[physical_addr[dpp]+8*i+4] <= data[32:39]; 
mem[physicaJ_addr[dpp]+8*i+5] <= data[40:47]; 
mem[physical_addr[dpp]+8*i+6] <= data[48:55]; 
mem[physical_addr[dpp]+8*i+7] <= data[56:63]; 
if (Delay2!=0) 

ta_reg_ <= #7 ’bz; 
end 

ta_reg_ <= #7 'bz; 
data <= #7 64’bz; 

Transfer_size[dpp] <= ’bz; 

Tnmsfer_type[dppJ <= none; 

@(posedge elk); 



default: $display("Memory module received bad TT[%d] = %b”,dpp, 
Transfer_type[dpp] ? M at time %d M , Stime); 

endcase 

end 

endmodule 
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APPENDIX C. PRC BEHAVIOR FILES 



The files in this appendix are the result of the 
behavioral design phase. They include the verilog behavioral 
models of the PRC and the testing results. The files are 
located on the Computer Center system at joshua_u2/jrrobert/ 
thesi s/veri 1 og/beha vi or . 



A. PRC 



j$L sfe s|e He He * He He He H« H« H« * He H« H< H< H< H< H« H« H« * H« He H« * * * H« * * He * H< He He He He H« H< H< H< H« He H« He H< He H« He * * H« '■¥ He H« sfe H« He He He He H« H« He efe * He H« H« He -f- 

* Predictive Read Cache 

* Filename: prc.v 

* Author: Joseph R. Robert, Jr. 

* Date: 02OCT95 

* Revised: 10JAN96 

* 

* Purpose: Ttiis module emulates the predictive read cache. 

He 

He H< He He H< H« He He He He He He He He He He He He He He He ★ He ★ He He He He He He He He He He He He He He He H« He He He Me He He He He He *fe H« H« He H« He :{e He :ft He H*- Hi :|e :|e 5fe *fe efl ije !fl 5}: : Si : fe He '•!*• He j 

module prc(CPU_BR_,BR_,BG_,ABB_,TS_,A,AP,APE_,TT,TSIZ,TC,TBST_.AACK_, 
DBG_,DBB_.D.DP,DPE_,TA_,HRESET_,CANX,elk); 

//Signals tire defined in system.v. Notations follow conventions used in 
// PowerPC Users Manual. 

input CPU_B R_,BG_, A AC K_,D BG_,T A_, HRESET_,clk ; 
output [0:1] TC; 

output BR_,APE_,DPE_,CANX; 

inout [0:31] A; 

inout [0:63] D; 

inout [0:7] DP; 

inout [0:4] TT; 

inout [0:3] AP; 

inout [0:2] TSIZ; 

inout ABB_,TS_,TBST_,DBB_; 

wire [0:1] TC; 

wire BR_,APE_,DPE_,CANX; 
wire [0:3 1 ] A; 
wire [0:63] D; 
wire [0:7] DP; 
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wire [0:4] TT; 

wire 1 0:3] AP; 

wire |0:2]TSIZ; 

wire ABB_,TS_.TBST_,DBB_; 

//dec hire variables, constants, parameters 
parameter TRUE = 1'bl, 

FALSE = 1’bO, 
hi = 1’bl, 
low = 1'bO; 

//Other internal control signals 

wire CAR_latch, predict,snoop_ignore; 

wire [0:255] DATALINE; 

wire [0:26] CAR; //current address register 

wire [0:26] NAR; // next address register 

wire [0:26] MRMA; // most recent memory access 

wire [0:6] ActiveLine; 

wire [0:1] BURSTSTART; 

//Connect parts 

bus_interface BIU1(NAR,BURSTSTART,BG_,CPUJBR_,AACK_,DBG_, 
send,fetch,clk,BR_.upload,download,fetch_done, 
send_done,CANX,snoop_ignore, 

DATALINE,D.A,DP,DPE_,TT,TS1Z,ABB_,TS_,TBST_,DBB_,TA_,HRESET_): 
snooper SNP1 (A,AP,TT,TC,TS_.snoop_ignore,liold,elk,CAR.BURSTSTART,read.write); 
controller C()Nl(HRESET_i,read.write.hit.scnd_doneJ‘etch_done. 
line_empt y,a_seleet, test, predict,st ore, 
riush,send,hold,new_replace,retch,clk); 
predictor PREl(MRMA,CAR,predict,NAR); 

line_mgr LM 1 (CAR,NAR,HRESET_,a_select,test,fetch_done,flush,store, 
ncw_replace,MRMA,ActiveLine,line_empty,hit); 
datalist DL 1 (DATALINE, ActiveLine,upload,download); 

endmodule 



B . CONTROLLER 



j'Jp. * * -Jfi. * jf. * * ifc * 'Jf. * * * * * * * * * * :je * * Me * * 'Jfi. * ‘Jfi * jfi 'Jfi s|« rfe '4e :le jfi J|« # jfi :i« jfi rfc jfi -}e jfi jfi :jc jfi jfi jfi jfi :|« :}« * jfi :{; :jc :f; r|= :|r if; :|: :fc rf: :fc * * :*r :fc 'Jfi 

* CONTROLLER 

* Filename: controller.v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 05JAN96 

* 

* Purpose: This module is a Finite State Machine which coordinates the actions 

* ol' all the other Junctional blocks ol‘ the PRC. All control signals are 
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* synchronous with ihe system clock. HRESET_ causes the Controller to go to 

* the IDLE state. See state diagram and state output tables. 

* 

* sjc He * He * * * * * * * * * * * * * * * * * * :Je * * * * * * * * :fe * * * * * * * * * * * * * :|: :fe :}: * $ tfe $ * :fc :|e :fs :fr :| e :fc * $ * :» :|: :{: t- * * * * :[: s|c * :j: :j: 

module controller (HRESET_,read,write,hil,send_doneJetch_done, 
Iine_empty,a_select,tesi,predict,siore, 
rinsh,sendJiold,new_replace,fetch,clk); 

input HRESET_,read, write, hit, send_doneJetch_done,line_empty,dk; 
output a _selecuest, predict, stored lush,send.hokLnew_replaee.feteh; 

reg a_selecl. lest, predict, stored lush,send,hold.ncw_replace,feleh; 

//dec litre variables, constants, panuneters 
parameter TRUE = Lb l, 

FALSE = I’bO, 
hi = 1’bl, 

low = 1’bO, 
trace = FALSE; 



//Finite State Machine v cun able and parameters 
reg [0:3] state, next_state; 
reg [0:2] inputs3; 
reg [0: 1 1 inputs!; 
reg input 1 ; 
parameter idle =0, 
tesi_car_r = 1 . 
send_data = 2, 
test_nar = 3, 
fetch_data = 4, 
is_line_empty = 5, 
predict_na = 6, 
store_car = 7, 
test_ear_w = X, 
llushjine = 9; 

//initialize signals 

initial 

begin 

state <= idle; //The state variables must be initialized to 
next_state <= idle; //avoid the default error message, 
end 

//FINITE STATE MACHINE 

always @(negedge HRESETJ 
begin 

stale <= idle; 
next_slate <= idle; 
wait(HRESET_ == hi); 
end 
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ilways 

begin 

#2 stale = next estate; 

if (trace) 

SdispiayC'Controller entered stale %d. ".slate); 

#1 ease (stale) 
idle: //() 
begin 

//a_selecl <= low; 
test <= low; 
predict <= low; 
store <= low; 

Hush <= low; 

send <= low; 

hold <= low; 

new_replace <= low; 
letch <= low; 

@(posedge elk) input s2 = (read, write); 
if(HRESET_ = low) 
next_state = idle; 
else 

case (inputs2) 

2'hOO: ncxt_slale = idle; 

2'b()l: nexi_stale = lesl_ear_w; 

2'blO: next_slale = lest_ear_r: 

2'bl 1: nexi_siale = test_ear_w; //This should not happen, 
endcase 
end 

lesi_e;u*_r: //I 
begin 

a_seleet <= low; //CAR 
lest <= hi; 
predict <= low; 
store <= low; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) input 1 = hit; 
case (input 1) 

1'bO: next_state= isjine_empty; 
l'b 1 : nexl_state = sendjdata; 
endcase 
end 

send_daia: //2 
begin 
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//a_select <= low; 
test <= low; 
predict <= hi; 
store <= low; 

flush <= low; 

send <= hi; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) inputl = send_done; 
case (inputl) 

1’bO: next_state = send_data; 
l'bl: next_state = test_nar; 
endcase 
end 

test_nar: //3 
begin 

a_select <= hi; //NAR 
test <= hi; 
predict <= low; 
store <= low; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) inputs3 = {hit, read. write}; 
case (inputs3) 

3*bOOO: next_state = fetch_data; 

313001 : next_state = idle; 

3'bOlO: next_state = idle; 

3’bOl 1: next_state = idle; //This should not happen. 
3’blOO: next_state = idle; 

3 b 101 : next_state = idle; 

3 b 110: next_state = idle; 

3’bl 1 1: next_state = idle; //This should not happen, 
endcase 
end 

fetch_data: //4 
begin 

a_select <=hi; //NAR 
test <= low; 
predict <= low; 
store <= low; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 
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fetch <= hi; 

@(posedge elk) inputl = fetch_done; 
case (inputl) 

TbO: next_state = fetch_data; 
l'bl: next_state = idle; 
endcase 
end 

isjine_empty: //5 
begin 

//a_select <= low; 
test <= low; 
predict <= low; 
store <= low; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) inputl = line_empty; 
case (inputl) 

TbO: next_state = predict_na; 
l’bl: next_state = store_car; 
endcase 
end 

predict_na: //6 
begin 

//a_select <= low; 
test <= low; 

predict <= hi; 

store <= low; 

flush <= low; 

send <= low; 

hold <=hi; 
new_replace <= hi; 
fetch <= low; 

@(posedge elk) next_state = test_nar; 
end 

store_car: //7 
begin 

a_select <= low; //CAR 
test <= low; 

predict <= low; 
store <= hi; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 



fetch <= low; 

@ (posed ge elk) next_state = idle; 
end 

test_car_w: //8 
begin 

a_select <= low; //CAR 
test <= hi; 
predict <= low; 
store <= low; 

flush <= low; 

send <= low; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) inputl = hit; 
case (inputl) 

1’bO: next_state = idle; 
l'bl: next_state = flush_line; 
endcase 
end 

flushjine: //9 
begin 

//a_select <= low; 
test <= low; 
predict <= low; 
store <= low; 
flush <=hi; 
send <= low; 

hold <= hi; 

new_replace <= low; 
fetch <= low; 

@(posedge elk) next_state = idle; 
end 

default: 

begin 

$display("state error in module controller."); 
$display(” state = %b. M ,state); 
end 

endcase 

end 

endmodule 
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c. 



SNOOPER 



j^i e ^sfcrfcjfc***#*:^*************************:^:***********^**********:^^**********^****:*:* 

* SNOOPER 

* Filename: snooper.v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 05JAN96 

* 

* Purpose: This module watches the system bus activity, and makes appropriate 

* reports to the PRC Controller. 

* If the transaction is a data burst read or any kind of write, and if the 

* address parity is correct, then the read or write signal is asserted as 

* appropriate, and the address is placed in the CAR. The snoop_ignore signal 

* tells this unit to ignore the current transaction, because it was initiated 

* by the Bus Interface Unit. The snoop_ignore signal must be asserted 

* concurrently with the transfer attributes. 

* Reads that are not burst or data related are ignored by the PRC. The CAR 

* is updated only on transactions relevant to the PRC. 

* Due to the two-stage pipelining capability of the PowerPC, with respect to 

* memory accesses, a second address tenure can occur shortly after the first, 

* well before the first data tenure is complete. To compensate for this, the 

* read and write outputs of the Snooper will remain exerted until acknowledged 

* by the Controller with hold. The rising edge of hold indicates that the read 

* or write signal was received by the Controller. The Snooper can then negate 

* these signals, but must leave CAR alone until hold is negated. After hold is 

* negated, CAR can be updated to the new address. 

* 

module snooper (A,AP,TT,TC,TS_,snoop_ignore,hold,clk,CAR.BURSTSTART, 
read_flag,write_flag); 

input [0:31] A; 
input [0:3] AP; 
input [0:4] TT; 
input [0:1] TC; 

mput TS_,snoop_ignore,hold,clk; 
output [0:26] CAR; 
output [0:1] BURSTSTART; 
output read_flag,write_flag; 

reg [0:26] CAR; 

reg [0:1] BURSTSTART; 

reg read_flag,write_flag; 

//declare variables, constants, parameters 
parameter TRUE = l’bl, 

FALSE = fbO, 
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hi = l'bl, 
low = 1’bO; 



//Address related 
reg [0:31] address; 

reg [0:3] addr_parity,addr_parity_calc; 

//Other external control signals 
reg [0:4] Transfer_type; 
parameter //for Transferjype 
none = 5'bz, 

write = 5’bOOOlO, //02 

write_atomic = 5’blOOlO, //12 
read = 5’bOlOlO, //OA 

read_atomic = 5'b 1 1 0 1 0, //I A 
burst_write =5’b00110, //06 

burst_read = 5'bO 1110, //OE 
burst_read_atomic = 5’bl 1 1 10; //IE 
reg [0:1] Transfers ode; 
parameter //for Transfer_code 
data_transfer = 2’b00, 
touchjoad =2'b01, 
instruction_fetch = 2’blO, 
reserved = 2'bll; 

reg ignore; 

//Other internal control signals 

reg valid_read_0, valid_read_l; //The numbers indicate the pipeline stage. 

reg valid_write_0, valid_write_l; 

tri parity_valid; 

reg Transaction_waiting; 

/initialize variables 
initial 
begin 

CAR <= 27Tdz; 

BURSTSTART<=2 r bz; 
read_flag <= low; 
write_flag <= low; 
address <= 32'bz; 
addr_parity <= 4’bz; 
addr_parity_calc <= 4'bz; 

Transfer_type <= none; 

Transfer_code <= none; 
ignore <= low; 

Transact! on_wai ting <= low; 
end 

//BEHAVIOR 
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//Calculate address parity, 
always @ (address) 
begin 

addr_parity_calc[0] <= -''address [0:7]; 
addr_parity_calc[l] <= ~ A address[8:15]; 
addr_parity_calc[2] <= ~ A address[16:23]; 
addr_parity_calc[3] = ~ A address[24:31]; 
end 

assign parity_valid = (addr_parity_calc == addr_parity); 

//If there is a transaction, 

// and that transaction is a data burst read or any kind of write 
// and the transaction is not initiated by the PRC itself, 

// and if the address parity is correct 

//then report the type of transaction to the Controller. 

always @(posedge elk) 
begin 

if (TS_=low) 

begin //latch address and attributes in stage 0. 
address <= A; 

Transfer_type <= TT; 

Transfer_code <= TC; 
ignore <= snoopjgnore; 
addr_parity = A P; 

#2 valid_read_0 = Transfer_code = data_transfer & 
(Transfer_type = burst_read I 
Transfer_type = burst_read_atomic); 
valid_write_0 = Transfer_type = write I 
Transfer_type = write_atomic I 
Transfer_type = burst_write; 

#4 if (! ignore & parity_valid & (valid_read_0 1 valid_write_0)) 
Transaction_waiting - hi; 
end 
end 

always @(posedge hold) 
begin 

read_flag <= low; 
write_flag <= low; 
end 



always 

begin 

wait(Transaction_waiting); 
valid_read_l = valid_read_0; 
valid_write_l = valid_write_0; 
Transaction_waiting = #2 low; 
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wait(!hold); 
if (valid_read_l) 
begin 

read_flag <= hi; 

CAR = address[0:26]; 

BURSTSTART = address [27:28]; 
end 

else if (valid_write_l) 
begin 

write_flag <= hi; 

CAR = address[0:26]; 

BURSTSTART = address[27:28]; 
end 
end 

endmodule 



D ♦ LINE MANAGER 



y^*%*^***^****5f:*************************************^********^*^5iC5iS5!c^:5jC5iC5jC5jC3|e3iC5iC5iC5iS3iC* 

* LINE MANAGER 

* Filename: lme_mgr.v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 05JAN95 

* 

* Purpose: This module contains the address list, status flags for each line 

* (Valid, Aged), a general status flag (line_empty), the line replacement unit, 

* and a couple of pointers (ActiveLine, ReplaceLine). 

* The MRMA output is always the MRMA of the ActiveLine. The line_empty 

* flag indicates that the currently active line has no addresses in it yet, and 

* therefore, cannot be used by the PRC to make a prediction. 

* The input a_select determines which address input is used for a particular 

* operation. The two address inputs are the CAR and the NAR. 

* When the Line Manager receives a test signal, it compares the input address 

* with the contents of the PredMA List. If there is a match with the CAR, it 

* asserts the hit signal, and changes the ActiveLine pointer to the line number 

* of the match. 

* If there is a miss with the CAR, then the ActiveLine switches to the same 

* line pointed to by ReplaceLine. 

* If, during a test, there is a match with the NAR, hit is asserted, and the 

* value in ActiveLine is irrelevant since it will not be used. If there is a 

* miss with the NAR, the ActiveLine must remain unchanged from the test. 

* The fetch_done signal from the Bus Interface Unit causes the NAR to be 

* stored in PredMA [ActiveLine], the CAR to be stored in MRMA [ActiveLine], the 

* Valid flag to be set, and the Aged flag to be reset. 

* The flush signal causes the current ActiveLine to become invalid by setting 
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* ValidfActiveLine] = 0. 

* The store signal causes the input address to be stored into the MRMA of the 

* ActiveLine. This is only used for the first address in a new line. Store 

* also causes the line_empty flag to be reset. 

* 

module line_mgr (CAR,NAR,HRESET_,a_select,test,fetch_done,flush,store, 
new_replace,MRMA_out,ActiveLine,line_empty,hit); 

input [0:26] CAR^NAR; 

input HRESET_,a_select,test,fetch_done,flush,store,new_replace; 
output [0:26] MRMA_out; 
output [0:6] ActiveLine; 
output line_empty,hit; 

reg [0:26] MRMA_out; 
reg [0:6] AcdveLine; 
reg line_empty,hit; 

//declare variables, constants, parameters 
parameter TRUE = l'bl, 

FALSE = LbO, 
hi = l’bl, 
low = 1’bO; 

//Address related 
reg [0:26] in_addr; 

//Data structure 

reg [0:26] PredMA [0:127], 

MRMA [0:127], 

PredMA_reg,MRMA_reg; 
reg Valid [0:127], 

Aged [0:127]; 

//Other internal control signals 

reg [0:7] il,i2,i3; 

reg [0:6] ReplaceLine; 

reg match,temp,all_lines_are_valid,done; 

//initialize variables 
initial 
begin 

for (i 1=0; il<=127; il=il+l) 
begin 

PredMA [il]<= 27'bO; 

MRMA[il] <= 27'bO; 
end 
end 
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//BEHAVIOR 



always @(negedge HRESETJ 
begin 

for (il=0; il<=127; il=il+l) 
begin 

Validfi 1] <= low; 

Aged[il] <=low; 
end 

ActiveLine <=0; 

ReplaceLine <= 0; 
line_empty <=hi; 
wait(HRESET_ == hi); 
end 

always @(a_select or CAR or NAR) //address multiplexer 
begin 

if (a_select=0) 
in_addr = CAR; 
else 

in_addr = NAR; 
end 

always @ (ActiveLine) 
begm 

MRMA_out = MRMA [ActiveLine]; 

SdisplayC’Line.mgr selected new ActiveLine = %d at Sd” T ActiveLine,Stime); 
end 

always @(posedge test) 
begin 

hit = low; 
match = low; 

#2 i2 = 0; 

while (! match & i2<l 28) 
if (PredMA[i2] = in_addr & Valid[i2]) 
match = hi; 
else 

i2 = i2 + 1; 

#2 if (match & a_select=0) //a match with the CAR 
begin 
hit <= hi; 

ActiveLine <= i2; 
end 

else if (match & a_select=l) // a match with the NAR 
hit <= hi; 

else if (Imatch & a_select=0) //a miss with the CAR 
ActiveLine <= ReplaceLine; 
else if (Imatch & a_select=l) //a miss with the NAR 
begin end// Do nothing. 
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end 



always @(posedge fetch_done) 
begin 

MRMA[ActiveLine] <= CAR; 

MRMA_out <= CAR; 

PredMA [ActiveLine] <= NAR; 

Valid [ActiveLine] <= hi; 

Aged[ActiveLine] = low; 
end 

always @(posedge flush) 
begin 

Valid[ActiveLine] = low; 

Sdi splay ("Line manager flushed line %d at time %d. , \ActiveLine,$time); 
end 

always @(posedge store) 
begin 

MRMA [ActiveLine] = in_addr; 

MRMA_out = MRMA [ActiveLine] ; 
line_empty = 0; 
end 

j-if. jfcsjc 5)C5(C * 

* LINE REPLACEMENT UNIT 

* 

* ReplaceLine always points to the line to be replaced at the next PRC miss. 

* As soon as the PRC starts predicting the first address for a line it 

* asserts new_replace, and the Line Replacement Unit can then find a new line 

* to mark as the next ReplaceLine. It searches sequentially for the next line 

* with invalid data and marks that line as the next to be replaced. If all 

* lines contain valid data, then it scans for the next line that is "aged", 

* indicated by a set Aged flag. As it scans for an aged line, it sets the Aged 

* bits in the lines it passes. Therefore, as it wraps around in search of an 

* aged line, it will eventually come upon one, even if none were aged when the 

* search began. 

* All of this occurs while the PRC is fetching data, so it has several clock 

* periods in which to complete the search. 

* * * * * * * * * * * * * * * * ****** ******** ** ^^jfcsJejfcsfcsfcsfcsfcsfcsjcsfca}:* * 5 }: ****** ******************** ** sjesfc / 



always 

begin 

temp = TRUE; 
for (i3=0; i3<=127; i3=i3+l) 
if (! Valid[i3]) 
temp = FALSE; 

#1 all_lines_are_valid = temp; 
end 
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always @(posedge new_replace) // find the next ReplaceLine 
begin 

done = FALSE; 

#2 while (!done) 
begin 

ReplaceLine = ReplaceLine + 1; //mod 128 addition 
if (! Valid [ReplaceLine]) 
done = TRUE; 

else if (all_lines_are_valid & Aged[ReplaceLine]) 
done = TRUE; 
else 

Aged[ReplaceLine] = 1; 
end 

line_empty = hi; 
end 

endmodule 



E. PREDICTOR 



* PREDICTOR 

* Filename: predictor.v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 05JAN96 

* 

* Purpose: This module calculates the Next Address (stored in NAR) based on the 

* Most Recent Memory Access (MRMA) and the Current Address (in the CAR). The 

* prediction calculation is 

* 

* NAR = 2*CAR - MRMA 

* 

* The calculation is initiated upon each rising edge of the predict signal. 

* The output NAR remains latched and valid until the next predict leading edge. 

* 

module predictor (MRMA,CAR,predict,NAR); 

input [0:26] MRMA,CAR; 
input predict; 
output [0:26] NAR; 

reg [0:26] NAR; 

parameter TRUE = l'bl, 

FALSE = lTaO, 
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trace = FALSE; 



// behavior 

always @(posedge predict) 
begin 

NAR = 2*CAR - MRMA; 
if (trace) 
begin 

$display( "Predictor: NAR = 2*CAR - MRMA”); 

$display(" %h = 2*%h - %h ,, ,{NAR,5 , bO},{CAR,5 , bO},{MRMA,5 , bO}); 
end 
end 

endmodule 



F. DATA LIST 



* DATA LIST 

* Filename: datalist.v 

* Author: Joseph R. Robert, Jr. 

* Date: 15DEC95 

* Revised: 05JAN96 

* 

* Purpose: This module emulates the PRC’s Data List. 

* 

* An upload signal causes the Data List to store the data on datajine into 

* the address specified by ActiveLine. 

* A download signal causes the Data List to assert onto datajine the data in 

* the address specified by ActiveLine. 

* 

module datalist (data Jine,ActiveLine,upload,download); 

input [0:6] ActiveLine; 
input upload, download; 
inout [0:255] datajine; 

tri [0:255] datajine; 

//declare variables, constants, parameters 
parameter TRUE = l'bl, 

FALSE = 1'bO, 
hi = l’bl, 
low = 1’bO, 
trace = TRUE; 
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//Data structure 

reg [0:255] line [0:127], 

Iine_reg, 

data_Iine_reg; 

assign data_line = data_line_reg; 

//initialize signals 
initial 
begin 

data_line_reg <= 256’bz; 
end 

//BEHAVIOR 
always @(posedge upload) 
begin 

line_reg = data_line; 
line[ActiveLine] = line_reg; 
if (trace) begin 

$display( M DATALIST uploaded this data into line %h at time %d.", 
ActiveLine,$time); 

$display( M %h'\line_reg); 
end 
end 

always @(posedge download) 
begin 

line_reg = line[ActiveLine]; 
data_line_reg = line_reg; 
if (trace) begin 

SdisplayC’DATALIST downloaded this data from line %h at time %d.'\ 
ActiveLine,$time); 

$display( M %h M ,line_reg); 
end 
end 

always @(negedge download) 
begin 

data_line_reg = 256’bz; 
end 

endmodule 
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BUS INTERFACE UNIT 



j * * * * * jje * ******************************** * * * * * * * * **************** * * * * * * * * ** * * ** * 

* BUS INTERFACE UNIT 

* Filename: bus_interface.v 

* Author: Joseph R. Robert, Jr. 

* Date: 09OCT95 

* Revised: 05JAN96 

* 

* Purpose: This module connects the PRC with the system bus. It handles 

* the protocol of data transfer in and out of the PRC. 

* When this module received a fetch signal, it latches the address in the 

* NAR, and requests the bus for a burst read. It stores the incoming data 

* until all four bursts have been received. Then it uploads the data into the 

* Data List and assserts fetch_complete. 

* When this module receives a send signal, it sends a cancel signal (CANX) to 

* the memory module, downloads data from the Data List, and then sends the data 

* to the CPU. When the transfer is finished, it asserts send_done. 

* The coordination of these activities is accomplished through the use of a 

* Finite State Machine. 

* 

******************************************************************************y 

module bus .interface (NAR_IN,BURSTSTART,BG_,CPU_BR_,AACK_,DBG_, 
send.fetch,clk,BR_,upload,download,fetch_done, 
send_done,CANX,snoop_ignore, 

DATALINE,D,A,DP,DPE_,TT,TSIZ,ABB_,TS_,TBST_,DBB_,TA_,HRESETJ; 

// Signals are defined in system.v. 
input [0:26] NAR_IN; 
input [0:1] BURSTSTART; 

input BG_,CPUJBR_,AACK_,DBG_,send,fetch,clk,HRESET_; 

output BR_,upload,download,fetch_done; 

output send_done,DPE_,CANX,snoop_ignore; 

inout [0:255] DATALINE; 

inout [0:63] D; 

inout [0:31] A; 

mout [0:7] DP; 

inout [0:4] TT; 

inout [0:2] TSIZ; 

inout ABB_,TS_,TBST_,DBB_,TA_; 



reg BR_,upload,download.fetch_done,send_done,CANX,snoop_ignore; 

tri [0:255] DATALINE; 

tri [0:63] D; 

tri [0:3 1J A; 

tri [0:7] DP; 

tri [0:4] TT; 

tri [0:3] AP; 
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tri [0:2] TSIZ; 

tri ABB_,TS_,TBST_,DBB_,TA_,DPE_; 

//declare variables, constants, parameters 
parameter TRUE = Tbl, 

FALSE = 1’bO, 
hi = l’bl, 
low = 1'bO, 
trace = TRUE; 

//Address related 
reg [0:31] NAR; 
reg [0:31] a_reg; 
assign A = a_reg; 

reg [0:3] ap_reg, addr_parity_calc; 

assign AP = ap_reg; 
reg [0:1] burst_start; 

//Data related 

reg [0:255] data_line_reg, datajine; 

assign DATALIKE = data_line_reg; 
reg [0:63] d_reg,data_reg; 
assign D = d_reg; 

reg [0:7] dp_reg, data_parity_calc, data_parity_m; 
assign DP = dpjreg; 

//Other external control signals 
reg [0:4] tt_reg,Transfer_type; 
assign TT = tt_reg; 
parameter //for Transfer_type 
none = 5LZ, 
burst_write =5’b00110, //06 

burst_read = 5'b01110, //0E 

burst_read_atomic = 5 f bl 1 1 10; //IE 
reg [0:2] tsiz_reg; 
assign TSIZ = tsiz_reg; 

reg abb_reg_,dbb_reg_,ts_reg_,tbst_reg_,ta_reg_; 
assign ABB_ = abb_reg_; 
assign DBB_ = dbb_reg_; 
assign TS_ = ts_reg_; 
assign TBST_ = tbst_reg_; 
assign TA_ = ta_reg_; 

//Other internal control signals 
reg [0:2] i; //counter 
reg [0:1] j; //counter 
wire qual_BG_,qual_DBG_; 

reg AB_Master,Transfer_in_progress,Transfer_start,Addr_termination, 
Data_Parity_Error; 
assign DPE_ = ~Data_Parity_Error; 
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event transfer_acknowledged,start_send; 

//Finite State Machine variable and parameters 
reg [0:3] state, next_state; 
reg [0:1] inputs2; 
reg input 1; 
parameter idle = 0, 
fetchl = 1, 
fetch2 = 2, 
fetch3 = 3, 
sendl =5, 
send2 = 6; 

//initialize signals 
initial 
begin 

BR_ <= hi; 
upload <= low; 
download <= low; 
fetch_done <= low; 

CANX <= low; 

NAR <= 32’bz; 
a_reg <= 32 'bz; 
ap_reg <= 4’bz; 
addr_parity_calc <= 4’bz; 
burst_start <= 2’bz: 
data_line_reg <= 256’bz; 
datajine <= 256’bz; 
d_reg <= 64 ’bz; 
data_reg <= 64 ’bz; 
dp_reg <= 8’bz; 
data_parity_calc <= 8'bz; 
data_parity_in <= 8’bz; 
tt_reg <= 5’bz; 
tsiz_reg <= 3’bz; 
abb_reg_ <= ’bz; 
dbb_reg_ <= ’bz; 
ts_reg_ <= 'bz; 
tbst_reg_ <= ’bz; 
ta_reg_ <= ’bz; 
i <= 3’bz; 
j <= 2’bz; 

AB_Master <= low; 

Transfer_in_progress <= low; 

Transfer_start <= low; 

Addr_termination <= low; 

Data__Parity_Error <= low; 
send_done <= low; 
snoop_ignore <= low; 
state <= 0; 
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next_state <= 0; 
inputs2 <= 2'bz; 
input 1 <= ’bz; 
end 

//ADDRESS BUS ARBITRATION 

assign quaI_BO_ = ~(!BG_ & ABB_j; 

//Assume mastership 
always @(posedge elk) 
if (qual_BG_ = low) 
begin 

abb_reg_ = #2 low; 

AB_Master = TRUE; 

BR_ <=#1 hi; 
end 

//Calculate address parity, 
always @(NAR) 
begin 

addr_parity_calc[0] <= ~ A NAR[0:7]; 
addr_parity_calc[l] <= ~ A NAR[8:15]; 
addr_parity_calc[2] <= ~ A NAR[16:23]; 
addr_parity_calc[3] = ~ A NAR[24:31]; 
end 

//Transfer address 
always @(posedge elk) 
if (qual_BG_ = low) 
begin 

ts_reg_ = #7 low; 

Transfer_start <= TRUE; 
a_reg <= NAR; 
ap_reg <= addr_parity_calc; 
tt_reg <= burst_read; 
tsiz_reg <= 3'bOlO; 
tbst_reg_ <= low; 
snoop_ignore <= hi; 
if (trace) 

SdisplayC'BIU started read from address %h at time %d.", 
NAR,$time); 
end 

always @(posedge elk) 
if (AB_Master & TS_=low) 
begin 

ts_reg_ = #7 hi; 
wait (AACK_==low); 

Addr_termination = TRUE; 
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end 



//Address termination 
always @(posedge elk) 
if (Addr_termination) 
begin 

#7 ts_reg_ <= 'bz; 
a_reg <= 'bz; 
ap_reg <= ’bz; 
tt_reg <= ’bz; 
tsiz_reg <= 'bz; 
tbst_reg_ <= ’bz; 
snoop_ignore <= low; 

//insert other addr transfer characteristics here. 
abb_reg_ <= #2 hi; 
abb_reg_ <= #8 'bz; 

AB_Master = FALSE; 

Addr_termination = FALSE; 
end 

//DATA BUS ARBITRATION FOR FETCHES 

assign qual_DBG_ = ~(!DBG_ & DBB_); 

always @(posedge elk) 
begin 

if (TA_ = low) 

-> transfer_acknowledged; 
end 

//calculate data parity. Odd parity, including parity bit. 
always @(data_reg) 
begin 

data_parity_calc[0] <= ~ A data_reg[0:7]; 
data_parity_calc[l] <= ~ A data_reg[8:15]; 
data_parity_calc[2] <= ~ A data_reg[ 16:23]; 
data_parity_calc[3] <= ~ A data_reg[24:31]; 
data_parity_calc[4] <= ~ A data_reg[32:39]; 
data_parity_calc[5] <= ~ A data_reg[40:47]; 
data_parity_calc[6] <= ~ A data_reg[48:55]; 
data_parity_calc[7] = -Mata.regfSb^l]; 
end 

always 

begin 

//wait for qualified data bus grant and transfer start. 
wait(qual_DBG_=low & Transfer_start); 
@(posedge elk) //assume data bus mastership 
dbb_reg_ <= #7 low; 
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i = 0; 

while (i<4) 
begin 

@(transfer_acknowledged) //latch beat 
data_reg <= D; 
data_parity_in = DP; 

#2 if (trace) $display(” BIU: %h at %d M ,data_reg,$time); 
#2 if (data_parityjm != data_parity_calc) 
begin 

$display("BIU: data parity error."); 

$display(" Calculated parity: %b", 
data_parity_calc); 

$display(" Recevied parity: %b", 
data_parity_in); 

Data_Parity_Error = TRUE; 
i =4; 
end 
else 
begin 

if (i=0) data_line[ 0: 63] = data_reg; 
if (i=l) data_line[ 64:127] = data_reg; 
if (i=2) data_line[128:191] = data_reg; 
if (i=3) data_line[192:255] = data_reg; 
i = i+1; 
end 
end 

Trans fer_in_progress <= FALSE; 

Transfer_start <= FALSE; 
dbb_reg_ = #4 hi; 
dbb_reg_ = #8 'bz; 
end 



//DATA BUS PROTOCOL FOR SENDS (PRC acting as memory module) 
always @(start_send) 
begin 
i = 0; 

while (i<4) begm 
@(posedge elk); 

#1 ta_reg_ = ’bz; 
j = burst_start+i; //j is mod 4 
if (j=0) data_reg = data_line[ 0: 63]; 
if 0=0 data_reg = data_line[ 64:127]; 
if 0=2) data_reg = data_line[128:191]; 
if 0=3) data_reg = data_line[ 192:255]; 
d_reg = data_reg; 

#4 dp_reg <= data_parity_calc; 

ta_reg_ <= low; 

i=i+l; 
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end 

send_done <= hi; 
@(posedge elk) 
ta_reg_ <= #7 'bz: 
d_reg <= #7 64' bz; 
dp_reg <= #7 8'bz; 
end 



//FINITE STATE MACHINE 
always @(negedge HRESETJ 
begin 

if (HRESET_ = low) 
begin 

state <= idle; 
next_state <= idle; 
wait(HRESET_ = hi); 
end 
end 

always 

begin 

#2 state = next_state; 

#1 case (state) 
idle: //O 
begin 

upload <= low; 
fetch_done <= low; 
send_done <= low; 

CANX <= low; 
data_line_reg <= 256'bz; 

@(posedge elk) input$2 = {send, fetch}; 
case (inputs2) 

2'b00: next_state = idle; 

2'b01: next_state = fetch 1; 

21)10: next_state = sendl; 

21)1 1: next_state = idle; //This should not happen, 
endcase 
end 

fetch 1: //I 
begin 

//l. Latch next address. 

NAR[0:26] <= NAR_IN; 

NARf27:31J <= 5'bO; 

//2. Request Bus 
BR_ <= low; 

Transfer_in_progress <= TRUE; 

@(posedge elk) 
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next_state = fetch2; 
end 

fetch2: //2 
begin 

//l. Wait for all data to be received. 

@(posedge elk) inputl = Transfer_in_progress; 
case (inputl) 

1’bO: next_state = fetch3; 
l'bl: next_state = fetch2; 
endcase 
end 

fetch3: //3 
begin 

//l. Upload the data line. 
data_line_reg <= data_line; 
upload <= hi; 

111 . Assert fetch_done. 
fetch_done <= hi; 

@(posedge elk) 
next_state = idle; 
end 

sendl: //5 
begin 

// 1. Cancel the memory access. 

CANX <= hi; 

111 . Latch burst_start. 
burst_start <= BURSTSTART; 

//3. Download data from the data list, 
download <= hi; 

#5 data_line <= DAT ALINE; 

@(posedge elk) 
next_state = send2; 
end 

send2: //6 
begin 

//I . Send data to CPU 
-> start_send; 

CANX <= low; 
download <= low; 

@(posedge elk) inputl = {send_done}; 
case (inputl) 

1 'bO: next_state = send2; 

1 b 1 : next_state = idle; 
endcase 
end 
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default: 

begin 

SdisplayC'state error in module bus_interface."); 
SdisplayC state = %b.",state); 
end 



endcase 

end 



endmodule 



H . PREDICTION TEST 



* Transaction Sequencer - Prediction Test 

* Filename: sequencer4.v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 05JAN96 

* 

* Purpose: This is one in a set of modules which perform a sequence of CPU 

* transactions. This sequencer causes a series of CPU operations that provide 

* a comprehensive test of the PRC. It demonstrates a majority of the PRC's 

* capabilities, showing when the Line Manager selects new lines, when and how 

* the Predictor functions, when the CPU starts a read or write and the data 

* involved. It shows when the Bus Interface Unit fetches data from memory. 

* The DataList reports the flow of data in and out of it. The only significant 

* behavior not exercised by this test is the function of the Line Replacement 

* Unit when the PRC is full. That is handled with Sequencer #5. 

* 

* Sequence #4: 

* burst_read OOh 

* burst_read 20h - PRC should predict 40h and fetch data. 

* burst_read 180h - PRC should start a new line. 

* burst_read 1 AOh - PRC should predict ICOh. 

* burst_read 40h - already in PRC, should predict 60h. 

* burst_write ICOh - should flush line. 

* bursM-ead 60h - already in PRC, predicts 80. 

* burst_read lOOh - PRC should start a new line. 

* 

* When using this sequencer, set all trace flags to TRUE (except the 

* Controller), and run the simulation for 6000 steps. 

* 

* General Timing instructions for ail Sequencers: 

* Use an initial block for each transaction. You must ensure that the 

* following rules are adhered to: 
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* 1. Before the first transaction, use 

* repeat(2)@(posedge elk) 

* 2. Before the first line of the second transaction, use 

* wait(ABB_=low); 

* wait(ABB_=hi); 

* 3. There can be only two transactions pipelined at a time. You must ensure 

* manually that the first operadon is complete before the third begins. 

* When scheduling the current transaction, look at the transacdon before 

* last. Wait for that TA_ to finish. Also, wait for the ABB_ from the 

* previous transaction to go high. 

* 4. A burst read takes 330 simuladon time units = 22 clock cycles. 

* 

module sequencer(Transfer_size,clk,pp,address,data,line,Transfer_type, 
Transfer_code,need_busjrigger_,ABB_); 

input clk,ABB_; 
output pp,need_bus_trigger_; 
output [0:31] address; 
output [0:63] data; 
output [0:255] line; 
output [0:4] Transferjype; 
output [0:2] Transfer_size; 
output [0:1] Transfer_code; 
reg pp,need_bus_trigger_; 
reg [0:31] address; 
reg [0:63] data; 
reg [0:255] line; 
reg [0:4] Transfer_type; 
reg [0:2] Transfer_size; 
reg [0:1] Transfer_code; 

//declare variables, constants, parameters 
parameter TRUE = l'bl, 

FALSE = 1'bO, 
hi = l T bl, 
low = 1'bO; 

parameter //for Transfer_type 
none = 5hz, 

write = 5'b00010, //02 

write_atomic = 5'b 10010, //12 
read = 5’bOlOlO, // OA 

read_atomic =5'bll010, //1A 
burst_write =5h00110, //06 

burst_read = 5'b01110, //OE 
burst_read_atomic = 5’bl 1 1 10; //IE 
parameter //for Transfer_code 
data_transfer = 2'b00, 
touchjoad =2’b01. 
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instruction_fetch = 2'blO, 
reserved = 2’bll; 

//initialize signals 
initial 
begin 
pp <= 0; 

address <= 32’bz; 
line <= 256’bz; 
end 

//Perform sequence of transactions 
initial 
begin 

repeat(2)@(posedge elk); 
//BURST READ 

pp <= ~pp; 

address <= 32’hOOOOOOOO; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

wait(ABB_=low); 

wait(ABB_=hi); 

//BURST READ 
PP <= ~PP: 

address <= 32'h00000020; 
Transfer_type <= burst_read; 
Transfer_code <= data_.tr ansfer; 
need_bus_tngger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

repeat (7 5)@(posedge elk); 
//BURST READ 

pp <= ~pp; 

address <= 32^00000180; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 
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begin 

repeat(100)@(posedge elk); 

//BURST READ 

pp <= ~pp; 

address <= 32’hOOOOOlAO; 

Transfer_type <= burst_read; 

Transfer_code <= data_traasfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

repeat(150)@(posedge elk); 

//BURST READ 

pp <= ~pp; 

address <= 32^00000040; 

Transfer_type <= burst_read; 

Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 

need _busjrigger_ <= #6 hi; 

end 

initial 

begin 

repeat(200)@(posedge elk); 

//BURST WRITE 

pp <= ~pp; 

address <= 32’hOOOOOlCO; 

Transfer_type <= burst_write; 

Transfer_code <- data_transfer; 

line <= {64’h7777777777777777, 64’h8888888888888888, 
64'h 1111111111111111, 64'h3333333333333333 } ; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

repeat(225)@(posedge elk); 

//BURST READ 

pp <= ~pp; 

address <= 32'h00000060; 

Transfer_type <= burst_read; 

Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 
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begin 

repeat(250)@(posedge elk); 
//BURST READ 
PP <= ~PP: 

address <= 32’h00000100; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

endmodule 



I. PREDICTION TEST RESULTS 



Host command: verilog 
Command arguments: 

-f verilog_arguments 
busjnterface.v 
pre.v 
snooper.v 
controllers 
datalist.v 
line_mgr.v 
predictor.v 
testbench.v 
arbiter.v 
cpu.v 
memory.v 
sequencer5.v 

VERELOG-XL 2.1.2 log file created Feb 2, 1996 13:14:29 
VERELOG-XL 2.1.2 Feb 2,1996 13:14:29 

Copyright (c) 1994 Cadence Design Systems, Inc. All Rights Reserved. 

Unpublished — rights reserved under the copyright laws of the United States. 

Copyright (c) 1994 UNIX Systems Laboratories, Inc. Reproduced with Permission. 

THIS SOFTWARE AND ON-LINE DOCUMENTATION CONTAIN CONFIDENTIAL INFORMATION 
AND TRADE SECRETS OF CADENCE DESIGN SYSTEMS, INC. USE, DISCLOSURE, OR 
REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF 
CADENCE DESIGN SYSTEMS, INC. 

RESTRICTED RIGHTS LEGEND 

Use, duplication, or disclosure by the Government is subject to 
restrictions as set forth in subparagraph (c)(l)(ii) of the Rights in 
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Technical Data and Computer Software clause at DFARS 252.227-7013 or 
subparagraphs (c)(1) and (2) of Commercial Computer Software — Restricted 
Rights at 48 CFR 52.227-19, as applicable. 

Cadence Design Systems, Inc. 

555 River Oaks Parkway 
San Jose, California 95134 

For technical assistance please contact the Cadence Response Center at 
1-800-CADENC2 or send email to crc_customers@cadence.com 

For more information on Cadence's Verilog-XL product line send email to 
talkverilog@cadence.com 

Compiling source file "bus_interface.v" 

Compiling source file "prc.v" 

Compiling source file "snooper .v" 

Compiling source file "controller.v" 

Compiling source file "datalist.v" 

Compiling source file "line_mgr.v" 

Compiling source file "predictor.v" 

Compiling source file "testbench.v” 

Compiling source file " arbiter, v" 

Compiling source file "cpu.v" 

Compiling source file "memory .v" 

Compiling source file "sequencer .v" 

Highest level modules: 
testbench 



Line_mgr selected new ActiveLine = 0 at Sd 5 

Line_mgr selected new ActiveLine = 1 at Sd 1162 

Line_mgr selected new ActiveLine = 2 at Sd 2287 

Line_mgr selected new ActiveLine = 3 at Sd 3412 

Line_mgr selected new ActiveLine = 4 at Sd 4537 

Line_mgr selected new ActiveLine = 5 at Sd 5662 

Line_mgr selected new ActiveLine = 6 at Sd 6787 

Line_mgr selected new ActiveLine = 7 at Sd 7912 

Line_mgr selected new ActiveLine = 8 at Sd 9037 

Line_mgr selected new ActiveLine = 9 at Sd 10162 

Line_mgr selected new ActiveLine = 10 at Sd 1 1287 

Line_mgr selected new ActiveLine = 1 1 at Sd 12412 

Line_mgr selected new ActiveLine = 12at$d 13537 

Line_mgr selected new ActiveLine = 13 at Sd 14662 

Line_mgr selected new ActiveLine = 14atSd 15787 

Line_mgr selected new ActiveLine = 15 at Sd 16912 

Line_mgr selected new ActiveLine = 16atSd 18037 

Line_mgr selected new ActiveLine = 17 at Sd 19162 

Line_mgr selected new ActiveLine = 18 at Sd 20287 

Line_mgr selected new ActiveLine = 19atSd 21412 

Line_mgr selected new ActiveLine = 20 at Sd 22537 
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Line_mgr selected 


new 


ActiveLine 


= 


21 


at 


Sd 


23662 


Line_mgr selected 


new 


ActiveLine 


= 


22 


at 


Sd 


24787 


Line_mgr selected 


new 


ActiveLine 


= 


23 


at 


$d 


25912 


Line_mgr selected 


new 


ActiveLine 


= 


24 


at 


Sd 


27037 


Line_mgr selected 


new 


ActiveLine 


= 


25 


at 


Sd 


28162 


Line_mgr selected 


new 


ActiveLine 


= 


26 


at 


Sd 


29287 


Line_mgr selected 


new 


ActiveLine 


= 


27 


at 


Sd 


30412 


Line_mgr selected 


new 


ActiveLine 


= 


28 


at 


Sd 


31537 


Line_mgr selected 


new 


ActiveLine 


= 


29 


at 


Sd 


32662 


Line_mgr selected 


new 


ActiveLine 


= 


30 


at 


Sd 


33787 


Line_mgr selected 


new 


ActiveLine 


= 


31 


at 


Sd 


34912 


Lme_mgr selected 


new 


ActiveLine 


= 


32 


at 


Sd 


36037 


Line_mgr selected 


new 


ActiveLine 


= 


33 


at 


Sd 


37162 


Line_mgr selected 


new 


ActiveLine 


= 


34 


at 


Sd 


38287 


Line_mgr selected 


new 


ActiveLine 


= 


35 


at 


Sd 


39412 


Line_mgr selected 


new 


ActiveLine 


= 


36 


at 


Sd 


40537 


Line_mgr selected 


new 


ActiveLine 


= 


37 


at 


Sd 


41662 


Line_mgr selected 


new 


ActiveLine 


= 


38 


at 


Sd 


42787 


Line_mgr selected 


new 


ActiveLine 


= 


39 


at 


Sd 


43912 


LLne_mgr selected 


new 


ActiveLine 


= 


40 


at 


Sd 


45037 


Line_mgr selected 


new 


ActiveLine 


= 


41 


at 


Sd 


46162 


Line_mgr selected 


new 


ActiveLine 


- 


42 


at 


Sd 


47287 


Line_mgr selected 


new 


ActiveLine 


= 


43 


at 


Sd 


48412 


Line_mgr selected 


new 


ActiveLine 


= 


44 


at 


Sd 


49537 


Line_mgr selected 


new 


ActiveLine 


= 


45 


at 


Sd 


50662 


Lme_mgr selected 


new 


ActiveLine 


= 


46 


at 


Sd 


51787 


Lme_mgr selected 


new 


ActiveLine 


= 


47 


at 


Sd 


52912 


Line_mgr selected 


new 


ActiveLine 


= 


48 


at 


Sd 


54037 


Lme_mgr selected 


new 


ActiveLine 


= 


49 


at 


Sd 


55162 


Line_mgr selected 


new 


ActiveLine 


= 


50 


at 


Sd 


56287 


Lme_mgr selected 


new 


ActiveLine 


= 


51 


at 


Sd 


57412 


Line_mgr selected 


new 


ActiveLine 


= 


52 


at 


Sd 


58537 


Line_mgr selected 


new 


ActiveLine 


— 


53 


at 


Sd 


59662 


Line_mgr selected 


new 


ActiveLine 


= 


54 


at 


Sd 


60787 


Line_mgr selected 


new 


ActiveLine 


= 


55 


at 


Sd 


61912 


Line_mgr selected 


new 


ActiveLine 


= 


56 


at 


Sd 


63037 


Line_mgr selected 


new 


ActiveLine 


= 


57 


at 


Sd 


64162 


Line_mgr selected 


new 


ActiveLine 


= 


58 


at 


Sd 


65287 


Line_mgr selected 


new 


ActiveLine 


= 


59 


at 


Sd 


66412 


Line_mgr selected 


new 


ActiveLine 


= 


60 


at 


Sd 


67537 


Line_mgr selected 


new 


ActiveLine 


= 


61 


at 


Sd 


68662 


Line_mgr selected 


new 


ActiveLine 


= 


62 


at 


Sd 


69787 


Line_mgr selected 


new 


ActiveLine 


= 


63 


at 


Sd 


70912 


Line_mgr selected 


new 


ActiveLine 


= 


64 


at 


Sd 


72037 


Line_mgr selected 


new 


ActiveLine 




65 


at 


Sd 


73162 


Line_mgr selected 


new 


ActiveLine 


— 


66 


at 


Sd 


74287 


Line_mgr selected 


new 


ActiveLine 


= 


67 


at 


Sd 


75412 


Line_mgr selected 


new 


ActiveLine 


— 


68 


at 


Sd 


76537 


Line_mgr selected 


new 


ActiveLine 


= 


69 


at 


Sd 


77662 


Line_mgr selected 


new 


ActiveLine 


= 


70 


at 


Sd 


78787 
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Line_mgr selected new ActiveLine = 71 at $d 79912 

Line_mgr selected new ActiveLine = 72 at $d 81037 

Line__mgr selected new ActiveLine = 73 at Sd 82162 

Line_mgr selected new ActiveLine = 74 at Sd 83287 

Line_mgr selected new ActiveLine = 75 at Sd 84412 

Line_mgr selected new ActiveLine = 76 at Sd 85537 

Line_mgr selected new ActiveLine = 77 at Sd 86662 

Lme_mgr selected new ActiveLine = 78 at Sd 87787 

Line_mgr selected new ActiveLine = 79 at Sd 88912 

Line_mgr selected new ActiveLine = 80 at Sd 90037 

Line_mgr selected new ActiveLine = 81 at Sd 91162 

Line_mgr selected new ActiveLine = 82 at Sd 92287 

Line_mgr selected new ActiveLine = 83 at Sd 93412 

Line_mgr selected new ActiveLine = 84 at Sd 94537 

Line_mgr selected new ActiveLine = 85 at Sd 95662 

Line_mgr selected new ActiveLine = 86at$d 96787 

Line_mgr selected new ActiveLine = 87 at Sd 97912 

Line_mgr selected new ActiveLine = 88 at Sd 99037 

Line_mgr selected new ActiveLine = 89atSd 100162 

Line_mgr selected new ActiveLine = 90 at Sd 101287 

Line_mgr selected new ActiveLine = 91 at Sd 102412 

Line_mgr selected new ActiveLine = 92 at Sd 103537 

Line_mgr selected new ActiveLine = 93 at Sd 104662 

Line_mgr selected new ActiveLine = 94 at Sd 105787 

Line_mgr selected new ActiveLine = 95 at Sd 106912 

Line_mgr selected new ActiveLine = 96 at Sd 108037 

Line_mgr selected new ActiveLine = 97 at Sd 109162 

Line_mgr selected new ActiveLine = 98 at Sd 1 10287 

Line_mgr selected new ActiveLine = 99 at Sd 1 11412 

Line_mgr selected new ActiveLine = 100 at Sd 1 12537 

Line_mgr selected new ActiveLine = 101 at Sd 113662 

Line_mgr selected new ActiveLine = 102 at Sd 1 14787 

Line_mgr selected new ActiveLine = 103 at Sd 115912 

Line_mgr selected new ActiveLine = 104 at Sd 1 17037 

Line_mgr selected new ActiveLine = 105 at Sd 118162 

Line_mgr selected new ActiveLine = 106 at Sd 1 19287 

Line_mgr selected new ActiveLine = 107 at Sd 120412 

Line_mgr selected new ActiveLine = 108 at Sd 121537 

Line_mgr selected new ActiveLine = 109 at Sd 122662 

Line_mgr selected new ActiveLine = 1 10 at Sd 123787 

Line_mgr selected new ActiveLine = 1 1 1 at Sd 124912 

Line_mgr selected new ActiveLine = 1 12 at Sd 126037 

Line_mgr selected new ActiveLine = 1 13 at Sd 127162 

Line_mgr selected new ActiveLine = 1 14 at Sd 128287 

Line_mgr selected new ActiveLine = 1 15 at Sd 129412 

Line_mgr selected new ActiveLine = 1 16 at Sd 130537 

Line_mgr selected new ActiveLine = 1 17 at Sd 131662 

Line_mgr selected new ActiveLine = 1 18 at Sd 132787 

Line_mgr selected new ActiveLine = 1 19 at Sd 133912 

Line_mgr selected new ActiveLine = 120 at Sd 135037 
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Line_mgr selected new ActiveLine = 121 at $d 136162 

Line_mgr selected new ActiveLine = 122 at Sd 137287 

Line_mgr selected new ActiveLine = 123 at $d 138412 

Line_mgr selected new ActiveLine = 124 at $d 139537 

Line_mgr selected new ActiveLine = 125 at $d 140662 

Line_mgr selected new ActiveLine = 126 at Sd 141787 

Line_mgr selected new ActiveLine = 127 at $d 142912 

Line_mgr selected new ActiveLine = 0 at Sd 145162 

Line_mgr selected new ActiveLine = 1 at Sd 146287 

Linejmgr selected new ActiveLine = 2 at Sd 147412 

Line_mgr selected new ActiveLine = 3 at Sd 148537 



L122 "testbench.v": Sfinish at simulation time 152010 

31769681 simulation events + 8392 accelerated events 

CPU time: 1.0 secs to compile + 0.9 secs to link + 1 16.2 secs in simulation 

End of VERELOG-XL 2.1.2 Feb 2, 1996 13:16:34 



J. LINE REPLACEMENT TEST 



* Transaction Sequencer - Line Replacement Test 

* Filename: sequencer5.v 

* Author: Joseph R. Robert, Jr. 

* Date: 05JAN96 

* Revised: 05JAN96 

* 

* Purpose: This is one in a set of modules which perform a sequence of CPU 

* transactions. This Sequencer causes a series of CPU operations which will 

* quickly fill the PRC. This will test the Line Replacement Unit's behavior 

* when it needs to start replacing previously used lines. 

* 

* Sequence #5: 

* 

* for i = 0 to 132, 

* burst_read iOOh - PRC should switch to new line i. 

* burst_read i20h - PRC should predict i40h, and store data in line i. 

* next i 

* 

* When using this sequencer, set all trace flags to FALSE, except for the Line 

* Manager, and run the simulation for 152000 steps. 

* 

* General Timing instructions for all Sequencers: 

* Use an initial block for each transaction. You must ensure that the 

* following rules are adhered to: 

* 1. Before the first transaction, use 

* repeat(2)@(posedge elk) 

* 2. Before the first line of the second transaction, use 

* wait(ABB_=low); 



130 



* wait(ABB_==hi); 

* 3. There can be only two transactions pipelined at a time. You must ensure 

* manually that the first operation is complete before the third begins. 

* When scheduling the current transaction, look at the transacuon before 

* last. Wait for that TA_ to finish. Also, wait for the ABB_ from the 

* previous transacuon to go high. 

* 4. A burst read takes 330 simulation time units = 22 clock cycles. 

* 

module sequencer(Transfer_sizexlk,pp,address,data,line,Transfer_type, 

Transfer_code,need_bus_trigger_,ABB_ ); 

input clk,ABB_; 
output pp,need_bus_trigger_; 
output [0:31] address: 
output [0:63] data; 
output [0:255] line; 
output [0:4] Transfer_type; 
output [0:2] Transfer_size; 
output [0:1] Transfer_code; 
reg pp,need_bus_trigger_; 
reg [0:31] address; 
reg [0:63] data; 
reg [0:255] line; 
reg [0:4] Transfer_type; 
reg [0:2] Transfer_size; 
reg [0:1] Transfer_code; 

//declare variables, constants, parameters 
parameter TRUE = Tbl, 

FALSE = 1'bO, 
hi = Tbl, 
low = 1'bO; 

parameter //for Transfer_type 
none = 5’bz, 

write = 5’bOOOlO, //02 

write_atomic = 5'blOOlO, //12 
read = 51)01010, // 0A 

read_atomic = 5 T bl 1010, //I A 
burst_write =5'b00110, //06 

burst_read = 5’b01 1 10, //0E 

burst_read_atomic = 5’bl 1110; //IE 
parameter //for Transfer_code 
data_transfer = 2T>00, 
touch_load =2’b01, 
instrucuon_fetch = 2’blO, 
reserved = 2’b 1 1 ; 

//Other internal control signals 
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r eg [0:7] i; // counter 

//initialize signals 
initial 
begin 
pp <= 0; 

address <= 32'bz; 
line <= 256’bz; 
end 

//Perform sequence of transactions 
initial 
begin 

repeat(2)@(posedge elk); 
//BURST READ 
PP <= ~PP; 

address <= 32’hOOOOOOOO; 
Transfer_type <= burst_read; 
Trans fer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

wait(ABB_=low); 

wait(ABB_=hi); 

//BURST READ 
PP <= ~pp; 

address <= 32’h00000020; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
end 

initial 

begin 

repeat(25)@(posedge elk); 
for (i=l; i<=132; i=i+l) 
begin 

repeat(50)@(posedge elk); 
//BURST READ 
pp <= -pp; 

address <= { 12'bO, i, 12’bO}; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_u*igger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 
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repeat(25)@(posedge elk); 
//BURST READ 

pp <= ~pp; 

address <= { 12'bO, i, 12'h020}; 
Transfer_type <= burst_read; 
Transfer_code <= data_transfer; 
need_bus_trigger_ <= #4 low; 
need_bus_trigger_ <= #6 hi; 

end 



endmodule 



K. 



LINE REPLACEMENT TEST RESULTS 



Host command: verilog 
Command arguments: 

-f verilog_arguments 
bus_interface.v 
pre.v 
snooper.v 
controller.v 
datalist.v 
line_mgr.v 
predictor.v 
testbench.v 
arbiter.v 
cpu.v 
memory .v 
sequencer4.v 

VERILOG-XL 2.1.2 log file created Feb 2, 1996 13:22:22 
VERILOG-XL 2.1.2 Feb 2,1996 13:22:22 

Copyright (c) 1994 Cadence Design Systems, Inc. All Rights Reserved. 

Unpublished — rights reserved under the copyright laws of the United States. 

Copyright (c) 1994 UNIX Systems Laboratories, Inc. Reproduced with Permission. 

THIS SOFTWARE AND ON-LINE DOCUMENTATION CONTAIN CONFIDENTIAL INFORMATION 
AND TRADE SECRETS OF CADENCE DESIGN SYSTEMS, INC. USE, DISCLOSURE, OR 
REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF 
CADENCE DESIGN SYSTEMS, INC. 

RESTRICTED RIGHTS LEGEND 
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Use, duplication, or disclosure by the Government is subject to 
restrictions as set forth in subparagraph (c)(l)(ii) of the Rights in 
Technical Data and Computer Software clause at DFARS 252.227-7013 or 
subparagraphs (c)(1) and (2) of Commercial Computer Software — Restricted 
Rights at 48 CFR 52.227-19, as applicable. 

Cadence Design Systems, Inc. 

555 River Oaks Parkway 
San Jose, California 95134 

For technical assistance please contact the Cadence Response Center at 
1-800-CADENC2 or send email to crc_customers@cadence.com 

For more information on Cadence’s Verilog-XL product line send email to 
talkverilog@cadence.com 

Compiling source file ”bus_interface.v" 

Compiling source file ”prc.v” 

Compiling source file ’’snooper .v” 

Compiling source file ’’controller. v" 

Compiling source file "datalist.v” 

Compiling source file ”Iine_mgr.v” 

Compiling source file "predictor.v” 

Compiling source file ’’testbench.v” 

Compiling source file "arbiter.v” 

Compiling source file "cpu.v" 

Compiling source file "memory.v" 

Compiling source file "sequencer4.v" 

Highest level modules: 
testbench 



Line_mgr selected new ActiveLine = 0 at Sd 5 

CPU started read from address 00000000 at time 45. 

CPU read: 0001020304050607 at 181 

CPU read: 08090a0b0c0d0e0f at 24 1 

CPU read: 1011 121314151617 at 301 

CPU read: 1819lalblcldlelf at 361 

CPU started read from address 00000020 at time 390. 

BIU started read from address 00000040 at time 412. 



CPU read: 2021222324252627 at 496 

CPU read: 28292a2b2c2d2e2f at 556 

CPU read: 3031323334353637 at 616 

CPU read: 38393a3b3c3d3e3f at 676 

BIU: 4041424344454647 at 812 

BIU: 48494a4b4c4d4e4f at 872 

BIU: 5051525354555657 at 932 

BIU: 58595a5b5c5d5e5f at 992 



DATALIST uploaded this data into line 00 at time 1008. 

404 1424 34445464748494a4b4c4d4e4f505 1 52535455565758595a5b5c5d5e5f 
CPU started read from address 00000180 at time 1 140. 
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Line_mgr selected new ActiveLine = 1 at $d 

CPU read: 0001020304050607 at 1276 

CPU read: 08090a0b0c0d0e0f at 1336 

CPU read: 1011121314151617 at 1396 

CPU read: 18191alblcldlelf at 1456 



CPU started read from address OOOOOlaO at time 1515. 



CPU read: 202 1 222324252627 at 1651 

BIU started read from address 00000 1 cO at time 1657. 

CPU read: 28292a2b2c2d2e2f at 1711 

CPU read: 3031323334353637 at 1771 

CPU read: 38393a3b3c3d3e3f at 1831 

BIU: 4041424344454647 at 1967 

BIU: 48494a4b4c4d4e4f at 2027 

BIU: 5051525354555657 at 2087 

BIU: 58595a5b5c5d5e5f at 2147 



DATALIST uploaded this data into line 01 at time 2163. 

404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f 
CPU started read from address 00000040 at time 2265. 

Line_mgr selected new ActiveLine = 0 at $d 2287 

DATALIST downloaded this data from line 00 at time 2313. 

404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f 



CPU read: 404 1424344454647 at 2356 

CPU read: 48494a4b4c4d4e4f at 2371 

CPU read: 5051525354555657 at 2386 

CPU read: 58595a5b5c5d5e5f at 2401 

BIU started read from address 00000060 at time 2482. 

BIU: 6061626364656667 at 2627 

BIU: 68696a6b6c6d6e6f at 2687 

BIU: 7071727374757677 at 2747 

BIU: 78797 a7b7c7d7e7f at 2807 



DATALIST uploaded this data into line 00 at time 2823. 

606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f 
CPU started write to address 00000 IcO at time 3007. 

CPU write beat 1: 7777777777777777 at 3022 

Line_mgr selected new ActiveLine = 1 at $d 3037 

Line manager flushed line 1 at time 3048. 

CPU write beat 2: 8888888888888888 at 3158 

CPU write beat 3: 1111111111111111 at 3218 

CPU write beat 4: 3333333333333333 at 3278 

CPU started read from address 00000060 at time 3390. 

Line_mgr selected new ActiveLine = 0 at Sd 3412 

DATALIST downloaded this data from line 00 at time 3438. 



606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f 



CPU read: 6061626364656667 at 3481 

CPU read: 68696a6b6c6d6e6f at 3496 

CPU read: 7071727374757677 at 3511 

CPU read: 78797a7b7c7d7e7f at 3526 

BIU started read from address 00000080 at time 3607. 

BIU: 0001020304050607 at 3752 

BIU: 08090a0b0c0d0e0f at 3812 



BIU: 101 1 121314151617 at 3872 

BIU: 18191alblcldlelf at 3932 

CPU started read from address 00000100 at time 3945. 

DATALIST uploaded this data into line 00 at time 3948. 

000102030405060708090a0b0c0d0e0fl01 112131415161718191alblcldlelf 
Line_mgr selected new ActiveLine = 2 at $d 3982 



CPU read: 0001020304050607 at 4066 

CPU read : 08090a0b0c0d0e0f at 4126 

CPU read: 1011121314151617 at 4186 

CPU read: 18191alblcldlelf at 4246 



L123 "testbench.v": $ finish at simulation time 6010 

1661039 simulation events + 265 accelerated events 

CPU time: 0.8 secs to compile + 0.8 secs to link + 5.0 secs in simulation 

End of VERILOG-XL 2.1.2 Feb 2, 1996 13:22:29 
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APPENDIX D. PRC STRUCTURE FILES 



This appendix contains the Verilog files for the final 
hardware design. They include the Verilog structural models 
of the PRC and the testing results. The files are located on 
the ECE system at home5/robert/thesis/epoch/verilog. 

A. PRC 



* Predictive Read Cache 

* Filename: prc.v 

* Author: Joseph R. Robert. Jr. 

* Date: 02OCT95 

* Revised: 14MAR96 

* 

Purpose: This module emulates the predictive read cache, connecting all the parts. 

* 

module prc(HRESET_,clk,BG_,DBG^BR_,CANX,D,A,DP,TT,AP,TSIZ,TC,ABB_,AACK_,TS_, 
TBST_,DBB_,TA_,DPEJ; 

// epoch set_attribute FIXED BLOCK = 1 

input HRESET_,clk,BG_,DBG_; 

output BR_,CANX; 

inout [63:0] D; 

inout [3 1 :0] A; 

inout [7:0] DP; 

inout [4:0] TT; 

inout [3:0] AP; 

inout [2:0] TSIZ; 

inout [1:0] TC; 

inout ABB_,AACK_,TS_,TBST_,DBB_,TA_,DPE_; 

wire [255:0] DATALINE; 
wire [26:0] CAR,NAR,MRMA; 
wire [6:0] ActiveLine; 
wire [1:0] BURSTSTART; 

wire fetch_done,fetch_abort,send_done,read,write,hit,line_empty, 
snoop_ignore,upload,downloa&BR_,CANX; 
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tri [63:0] D; 
tri [31:0] A; 
tri [7:0] DP; 
tri [4:0] TT; 
tri [3:0] AP; 
tri [2:0] TSIZ; 
tri [1:0] TC; 

tri ABB_,AACK_,TS_.TBST_,DBB_,TA_,DPE_; 



//Connect parts which have been converted to hardware. 

//epoch precompiled predictor 

predictor PREl(MRMA,CAR[25:0],predict,NAR,HRESET_); 

// epoch precompiled line_mgr 

line_mgr LMl(CAR,NAR,HRESET_,a_select,test,fetch_done,flush,store, 
new_roplace,MRMA,ActiveLineJine_enipty,hit,clk); 

// epoch precompiled datalist 

datalist DLl(DATALINE,ActiveLine,upload,download); 

// epoch precompiled snooper 

snooper SNl(A,AP,TT.TC,TS_,snoop_ignore,hold,clk,CAR.BURSTSTART, 
read,write,HRESETJ; 

// epoch precompiled bus_interface 

busjnterface BrUl(NAR,BURSTSTART,BG_,AACK_,DBG_,send,fetch, 
clk3R_,upload,downloadfetch_done,fetch_abort, 
send_done,CANX,snoop_ignore,DATALINE,D,A,AP,DP,DPE_, 
TT,TSIZ,TC,ABB_,TS_,TBST_,DBB_,TA_,HRESET_); 

// epoch precompiled controller 

controller CONl(HRESET_,read,write,hit,send_done,fetch_done.fetchcbort, 
linecmpty.a_sdect,test,predict,store, 
flush,sendJiold,new_replace,fetch,clk); 



endmodule 



B . CONTROLLER 



J rf:***:*:****:*: * * sje * * * * * =fc sfc*** * sfc sje sfc * * **** * sfc * * * * * * * * % * * sjc sfc * * * * * * * * * * % * * * * sjc * Jfc * * * * * **** * 

* CONTROLLER 

* Filename: controllers 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 
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* Revised: 20MAR96 

* 

Purpose: This module is a Finite State Machine which coordinates the actions of all the other functional 
blocks of the PRC. All control signals are synchronous with the system clock. HRESET_ causes the Controller 
to go to the IDLE state. The state diagram and state output tables give more details. 

Of significance are the wait states added to the state diagram of the behavioral model. These changes are 
highlighted in the State Output Table. The changes were required by the Line Manager, in which there is a 
significant propagation delay for the addresses. This is described in more detail in the Line Manager section of this 
chapter. This is a prime candidate for future work to improve the PRC’s design. 

Sit 

module controller (HRESET_,read,write,hit,send_done,fetch_done,fetch_abort, 
line_empty,a_select,test.predict,store, 
flush,send,hold,new_replace,fetch,clk); 

//epoch set_attribute FIXEDBLOCK = 1 

input HRESET_,read,write,hit,send_done,fetch_done,fetch_abort,line_empty,clk; 
output a_select,test,predict,store,flush,send,hold,new_replace,fetch; 

reg a_select,test,predict,store,flush,sendJiold,new_replace,fetch; 



//Finite State Machine 



parameter // epoch enum stat 
idle = 5’dO, 
test_car_r =5’dl, 
send_data = 5'd2, 
test_nar = 5'd3, 

fetch_data = 5'd4, 
is_line_empty = 5’d5, 
predict_na = 5’d6, 
store_car = 5’d7, 

test_car_w = 5'd8, 
flush_line = 5'd9, 

wait_a - 5'dlO, 

wait_b = 5’dll, 

wait_c =5’dl2, 

wait_d =5'dl3, 

wait_e =5'dl4, 

wait_f = 5’dl5, 

wait_g =5'dl6, 

wait_h =5'dl7, 

wait_i =5’dl8, 
dc_state = 5’bx; 



reg [4:0] /* epoch enum stat */ state, next_state; 

reg a_select,fetch,flush,hold,new_replace,predict,send,store,test; 
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always @(posedge elk or negedge HRESETJ 
begin 

if (! HRESETJ 
state = idle; 

else 

state = next_state; 
end 

always @(state or read or write or hit or send_done or line_empty or 
fetch_done or fetch_abort) 
begin 

//default values 

a.select = l bO; //CAR 
fetch = 1 'bO; 

flush = 1'bO; 

hold = 1’bO; 

new_replace = 1 bO; 
predict = 1'bO; 
send = 1'bO; 

store = 1’bO; 

test = 1 'bO; 



case (state) 

idle: HO 
begin 

if (read = 1’bO & write = 1'bO) next_state = idle; 
else if (read = 1’bO & write = 1 'b 1 ) next_state = wait_d; 
else if (read = l’bl) next_state = wait_a; 
else next_state = dc_state; 
end 

wait_a: //10 
begin 

hold = l'bl; 

next_state = wait_b; 
end 

wait_b^ //1 1 

begin 

hold = l’bl; 

next_state = wait_c; 
end 

wait_c: //12 
begin 

hold = l’bl; 

next_state = test_car_r; 
end 
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wait_d: //13 
begin 

hold = l’bl; 
next_state = wait_e; 
end 

wait_e: //14 
begin 

hold = l’bl; 

next_state = wait_f; 
end 

wait_f: //15 
begin 

hold = l’bl; 

next_state = test_car_w; 
end 

test_car_r: //I 
begin 

test =11)1; 
hold = l’bl; 

if (hit) 

next_state = send_data; 
else next_state = is_line_empty; 
end 

send_data: //2 
begin 

a_select = l’bl; //NAR 
predict = l’bl; 
send = l’bl; 

hold = l’bl; 

if (send_done) 
next_state = test_nar; 
else next_state = send_data; 
end 

test_nar: //3 
begin 

a_select = l’bl; //NAR 
test = l’bl; 
hold = l’bl; 

if ({hit,read,write} = 3’b000) next_state = fetch_data; 
else next_state = idle; 
end 

fetch_data: //4 
begin 

a_select = l’bl; //NAR 
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hold = l'bl; 

fetch = l'bl; 

if ({fetch_done,fetch_abort} = 2’b00) next_state = fetch_data; 
else next_state = idle; 
end 

is_line_empty: //5 
begin 

hold = l'bl; 

if (line_empty) 
next_state = store_car; 
else next_state = predict_na; 
end 

predict_na: //6 
begin 

a_select = l’bl; 
predict = l'bl; 
hold = l'bl; 

new_replace = l'bl; 
next_state = wait_g; 
end 

wait _g://16 
begin 

a_select = l'bl; //NAR 
hold = l'bl; 

next_state = wait_h; 
end 

wait_h: //17 
begin 

a_select = l'bl; //NAR 
hold = l'bl; 

next_state = wait_i; 
end 

waitj: //18 
begin 

a_select = l'bl; //NAR 
hold =11)1; 

next_state = test_nar; 
end 

store_car: //7 
begin 

store = l’bl; 

hold = l'bl; 

next_state = idle; 
end 
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test_car_w: //8 
begin 

test = l’bl; 
hold = l’bl; 

if (hit) 

next_state = flushjine; 
else next_state = idle; 
end 

flushjine: //9 
begin 

flush = 1‘bl; 

hold = Tbl; 

next_state = idle; 
end 

default: 

begin 

next_state = dc_state; 
end 

endcase 

end 

endmodule 



C . SNOOPER 



* SNOOPER 

* Filename: snooper .v 

* Author: Joseph R. Robert, Jr. 

* Date: 21DEC95 

* Revised: 06MAR96 

* 

Purpose: This module watches the system bus activity, and makes appropriate reports to the PRC 
Controller. 

If the transaction is a data burst read or any kind of write, and if the address parity is correct, then the read 
or write signal is asserted as appropriate, and the address is placed in the CAR. The snoop_ignore signal tells this 
unit to ignore the current transaction, because it was initiated by the Bus Interface Unit. The snoopjgnore signal 
must be asserted concurrently with the transfer attributes. Reads that are not burst or data related are ignored by 
the PRC. The CAR is updated only on transactions relevant to the PRC. 

Due to the two-stage pipelining capability of the PowerPC, with respect to memory accesses, a second 
address tenure can occur shortly after the first, well before the first data tenure is complete. To compensate for this, 
the read and write outputs of the Snooper will remain exerted until acknowledged by the Controller with hold. The 
rising edge of hold indicates that the read or write signal was received by the Controller. The Snooper can then 
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negate these signals, but must leave CAR alone until hold is negated. After hold is negated, CAR can be updated 
to the new address. 

In Stage 0, the transfer attributes are latched in registers. Combinational logic determines if these tranfer 
attributes represent a valid read or a valid write, and if the parity address parity is correct. If the transaction is valid, 
and one that the PRC is interested in, then Stage 0 raises a transaction_waiting signal. 

A Finite State Machine in Stage One sits in the IDLE state until it receives that signal. Then it latches 
the signals needed from Stage 0, resets the transaction_waiting signal, and then waits for the hold signal to go low. 
A high hold signal indicates that the PRC is not done with the previous transaction. Once hold goes low, the read 
and write flags are set according to the type of the current transaction. Also, the input address is stored in the 
Current Address Register. The FSM then waits for the rising edge of hold before returning to the IDLE state where 
it can check if there is another transaction waiting. 



* * ***** sfe * * * * * * * ** * sft ******** * * * * s|e * ****■ * * ** * * * * * * **** * * sfe * sfesfe* * * * * sfe * **** * * * ****** * 

module snooper (A,AP,TT,TC,TS_,snoop_ignore,hold,clk,CAR,BURSTSTART, 
read_flag,write_flag,HRESET_); 

//epoch set_attribute FIXEDBLOCK = 1 

input [31:0] A; 
input [3:0] AP; 
input [4:0] TT; 
input [1:0] TC; 

input TS_,snoop_ignore,hold,clk,HRESET_; 
output [26:0] CAR; 
output [1:0] BURSTSTART; 
output read_flag,write_flag; 

wire [31:0] addressO; 

wire [28:0] address 1; 

wire [26:0] CAR; 

wire [4:0] TransferType; 

wire [3:0] addr_parity; 

wire [1:0] BURSTSTART,TransferCode; 

wire car_latch,flag_reset_,hold_,ignore,latchO,latchl,parity_error, 
read_flag,read_set_,TS,transaction_waiting,tw_set,tw_reset_, 
val id_op,valid_readO,valid_read 1 ,valid_writeO,valid_write 1 , 
w 1 , w2, w3 , w 5 , w6 , w7 , wri te_fl ag_, wr i t e_se t_,p rel atchO; 



//STAGE 0 

//Stage 0 latches 

stdinv TS_INV (TS_,TS); 

stddff TSJLatch (.CLK(clk),.D(TS),.Q(prelatchO)); 

stdbuf LatchOBuffer (,IN0(prelatch0),.Y(latch0)); 

dff#(32,0, ,, AUTO7T n ) AddressLatchO (.CLK(latch0),.D(A),.Q(address0)); 
dff #( 4,0,” AUTO Vr) AddrParityLatch (.CLK(latchO),.D(AP),.Q(addr_parity)); 
dff #( 5,0,”AUTG H , , T") TransferTypeLatch (.CLK(latchO),.D(TT),.Q(TransferType)); 
dff #( ZO/'AUTO’VT') TransferCodeLatch (.CLK(latchO),.D(TC),.Q(TransferCode)); 
stddff IgnoreLatch (.CLK(latchO),.D(snoop_ignore),.Q(ignore)); 
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//Odd parity checker 
parity o_chk32 

OddParityChecker (.D(addressO),.PIN(addr_parity),.ERROR(parity_error)); 

//Read checker 

stdnor2 NOR_C (TransferCode[l],TransferCode[0],wl); 
stdinv INV_F (TransferType[0],TransferTypeO_); 

stdand4 AND_D (TransferType[3],TransferType[2],TransferType[l],TransferTypeO_, 
w2); 

stdand2 AND_E (wl,w2,valid_read0); 

//Write checker 

stdinv INV_J (TransferType[3],TransferType3_); 
stdnand2 NAND_H (TransferType[4]/TransferType[2],w3); 

stdand4 AND_G (TransferType3_,TransferType[l],TransferTypeO_,w3,valid_writeO); 
//Transacdon checker 

stdnor2 NORJL (valid_write0,valid_read0,w5); 

stdnor3 NOR_M (parity_error,w5,ignore,valid_transacdon); 

//Transacdon Waiting Latch 

stdand2 TW_Set AND (latchO,valid_transacdon,tw_set); 
stdand2 TW_ResetAND (tw_resetl_,HRESET_,tw_reset_); 
stdlatch_c TW_Latch 

(.D(tw_set),.CLR(tw_reset_),.EN(latchO),.Q(transacdon_waidng)); 

//STAGE 1 

//Stage 1 latches 
dff#(29A M AUTO , 7T') 

AddressLatchl (.CLK(latchl),.D(addressO[31:3]),.Q(addressl)); 
stddff ValidReadLatchl (.CLK(latchl),.D(valid_readO),.Q(valid_readl)); 
stddff ValidWriteLatchl (.CLK(latchl),.D(valid_writeO),.Q(valid_writel)); 

//read and write flags 

stdinv HOLD_INV (hold,hold J; 

stdand2 FLAG_RESET_AND (.INO(holdJ,.INl(HRESETJ,.Y(flag_resetJ); 
stddff_c ReadFlagLatch 

(.CLK(flag_clk),.CLR(flag_resetJ),.D(valid_readl ),.Q(read_flag )); 
stddff_c WriteFlagLatch 

(.CLK(flag_clk) v CLR(flag_reset_),.D(valid_writel),.Q(write_flag)); 

//Current Address Register 
dff#(29,0, M AUTO M , H l n ) 

CA_Register (.CLK(car_latch),.D(addressl),.Q({CAR,BURSTSTART})); 



//FINITE STATE MACHINE 
parameter // epoch enum stat 
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IDLE = 3'dO, 

LATCH = 3'dl, 

OUTPUTS = 3’d2, 

WAIT_FOR_HOLD = 3'd3, 
WAIT_FORJNOT_HOLD = 3’d4, 
dc_state = 3'bxx; 

reg [2:0] /* epoch enum stat */ state, next_state; 

reg latchl,tw_resetl_,flag_clk,car_latch; 

always @(posedge elk or negedge HRESET_J 
begin 

if (IHRESETJ 

state = IDLE; 
else 

state = next_state; 
end 

always @(state or transaction_waiting or hold) 
begin 

//default values 
latchl = 1’bO; 
tw_resetl_= l’bl; 
flag_clk = 1’bO; 
carjatch = 1’bO; 

case (state) 

IDLE: begin 

if (transaction_waiting) 
next_state = LATCH; 
else next_state = IDLE; 
end 

LATCH: begin 

latchl = l’bl; 
tw_resetl_= TbO; 
if (hold) 

next_state = WAIT_FOR_NOT_HOLD; 
else next_state = OUTPUTS; 
end 

WAIT_FOR_NOT_HOLD: begin 
if (hold) 

next_state = WAIT_FOR_NOT_HOLD; 
else next_state = OUTPUTS; 
end 



OUTPUTS: begin 



flag_clk = l’bl; 
carjatch = l’bl; 

next_state = WAIT_FOR_HOLD; 
end 

W AI T JFO R_HOLD : begin 
if (hold) 

next_state = IDLE; 

else next_state = WAIT_FOR_HOLD; 
end 

default: begin 

next_state = dc_state; 
end 

endcase 



endmodule 



1. Thirty-Two- Input , Odd-Parity Checker 



j Sj« * * 3jc 5}C * * * S}C * * * Sfc * Sfc 5jc 5*C Sfc 5fc 5fc * Sfc 5jc * Sf« * Sfc * * Sfc Sje * * * Sfc * * * * Sfc * * * * * S*C * Sfc * 5je * * * * * * * * * * 5jC 3fC Sf« Sfc * 3jC * 3fc * 5fc * * * * * * * 5*C 

* ODD PARITY CHECKER 

* Filename: parityo_chk32.v 

* Author: Joseph R. Robert, Jr. 

* Date: 12FEB96 

* Revised: 12FEB96 

* 

Purpose: This module checks the parity of the input data, comparing it to the input parity. Parity is odd including 
the parity bit. 

module parityo_chk32 (DRINJERROR); 

input [31:0] D; 
input [3:0] PIN; 
output ERROR; 

wire ERROR_0,ERROR_1 ,ERROR_2,ERROR_3, ERROR; 
parityco #(8,0,"AUTO , 7T') 

parity_group_0 (.D(D[ 7: 0]),.PIN(PIN[0]),.ERROR(ERROR_0)); 
parityco #(8,0,' , AUTO M , , T U ) 

parity_group_l (.D(D[15: 8]),.PIN(PIN[l]),.ERROR(ERROR_l)); 
parityco #(8,0, H AUTO"/T") 



147 



parity_group_2 (.D(D[23:16]),.PIN(PEN[2]),.ERROR(ERROR_2)); 
parityco #(8,0,"AUTO",''l") 

' parity_group_3 (.D(D[31:24]),.PIN(PIN[3]),.ERROR(ERROR_3)); 
sidor4 OR_A (ERROR_0,ERROR_1,ERROR_2,ERROR_3,ERROR); 



endmodule 



D . LINE MANAGER 



LINE MANAGER 
Filename: line_mgr.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 20MAR9696 

Purpose: The function of this module is completely described in the behavioral model. 

This structural model uses a high speed RAM (hsram) for the MRMA List. The CAR is stored into this 
RAM on a store or fetch_done signal. 

The predicted_ma_list is a register file for storing predicted memory addresses. This list is composed 
of 128 address registers, 128 equality comparators, and 128 Valid status flags. The NAR is stored in this list at 
the fetch_done pulse. If there is a match with the input address (in_addr), a priority encoder (ENC_C) determines 
which line matches. 

The line replacement unit determines the next line to be replaced whenever the PRC needs to start a new 
line. It first selects invalid lines. If all the lines are valid, then it selects lines that have been "aged". A priority 
encoder (ENC_1) choses the line with the lowest index among all the lines that can be replaced. If all lines are 
valid, the encoder’s output enable (oe) signal is used to cause aging. 

Aging is accomplished by the use of a 7-bit counter (ager_counter), initially set to zero. When the 
cause_aging signal from the encoder is high, the counter advances. A decoder (DEC_B) output causes the 
appropriate Aged flag to be set. 

Changing values of the CAR or NAR have a propogation delay of 25 ns (1.8 cycles) through the input 
address multiplexer (in_addr mux). This required the addition of wait states in the Controller before each' of the 
tests. 

The Revised Controller State Diagram and Revised Controller State Output Table show the required changes. 

module line_mgr (CAR,NAR,HRESET_,a_select,test,fetch_done,flush,store, 
new_replace,MRMA_out,ActiveLine,line_empty,hit,clk); 

// epoch set_attribute FDCEDBLOCK = 1 

input [26:0] CAR,NAR; 

input HRESET_,a_select,test,fetch_done,flush,store,new_replace,clk; 
output [26:0] MRMA_out; 
output [6:0] ActiveLine; 
output line_empty,hit; 
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wire [127:0] Valid; 

wire [26:0] in_addrdn_addr_buf,MRMA_out; 
wire [6:0] ActiveLine,ReplaceLine,match_line,wl; 
wire MRMA_write 7 match,all_lines_valid,a_select_,w2,hit, 
hreset,store_,line_empty,le_set_; 

//Address multiplexer 
mux2 #(27,0," AUTO"," 1 ") 

addrmux (.INO(CAR),.INl(NAR),.SO(a_select),.Y(in_addr)); 
buff #(27,0,"AUTO","20") InAddrBuffer (.INO(in_addr),.Y(in_addr_buf)); 

//MRMA_list 

stdnor2 MRMA_NOR (.INO(store),.INl(fetch_done),.Y(MRMA_write_)); 
hsram #(27,128,7, 32,1, "2") 

MRMAJist (.A(ActiveLine),.DJN(CAR),.WR(MRMA_write_),.DOUT(MRMA_out)); 
//PredMA_list 

//epoch precompiled predicted_ma_list 
predicted_ma_list 

PredMAJistl (NAR,in_addr_buf,ActiveLine,fetch_done,flush,HRESET_, 

V ali djnatch_line,match) ; 
and 128 all_valid_ands (Valid,all_lines_valid); 

//Line Replacement Unit 

// epoch precompiled line_replacement_unit 

line_replacement_unit LRU 1 (Valid, Ac tiveLine,all_lines_v alid, 

new_replace,fetch_done,HRESET_,clk,ReplaceLine); 

//ActiveLine pointer 

stdbufinv a_select_inv(.IN0(a_select),.Y(a_selecO); 
stdand2 AL_AND (.IN0(test),.INl(a_select_),.Y(w2)); 
mux2 #(7,0, ,, AUTO","1 M ) 

al_mux (.INO(ReplaceLine),.INl(match_line),.SO(match),.Y(wl)); 
dff_c #(7,0, ,, AUTO", , T M ) 

ActiveLineReg (.CLK(w2),.CLR(HRESET_),.D(wl),.Q(ActiveLine)); 

//Hit status flag 

stdlatch hit_latch(.D(match),.EN(test),.Q(hit)); 

//line_empty status flag 

stdbufinv HRESET_inv(.INO(HRESETJ,.Y(hreset)); 
stdbufinv store Jnv(.INO(store),.Y(store_)); 
stdnor2 LE_NOR (.INO(hreset),.INl(new_replace),.Y(le_set_)); 
srlatch line_empty_latch(.S_(le_set_),.R_(store_),.Q(line_empty)); 

endmodule 
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1 . 



Address Register With Equal Comparator 



ADDRESS REGISTER WITH EQUALITY COMPARITOR for PredMA storage 

Filename: addre.v 

Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 13FEB96 

Purpose: This structural model is a building block for the Predicted Memory Address List (PredMA_List). It 
consists of a single 27 -bit register and an equality comparator. The output of the register is compared with the 
input address (in_addr). 

module addre (NAR,in_addr,store_enable,eq.HRESET_); 

// epoch set_attribute FIXEDBLOCK = 1 

input [26:0] NAR,in_addr; 
input store_enable,HRESET_; 
output eq; 

wire [26:0] wl; 
wire eq; 

dff_c #(27,0,”AUTO7T’) PredMA_reg (.CLK(store_enable),.CLR(HRESETJ, 

.D(NAR),.Q(wl)); 

equal #(27,0/ AUT07T M ) equall (.A(wl),.B(in_addr),.Y(eq)); 
endmodule 



2. AND Gate With 128 Inputs and One Output 



J * * * * * * 5k *** * * ***** * * *************************** * * ** * * * * * * * * * ***** * ** * * sfc * * * * * * 

128-INPUT AND GATE 
Filename: andl28.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 20MAR96 

Purpose: Dus structural model is a 128-input AND gate. 

*******************************************************:ic*;*******************y 

module and 128 (in.out); 
input [127:0] in; 
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output out; 

wire out,out_unbuffered; 



wire [31:0] A; 
wire [7:0] B; 
wire [1:0] C; 

and4 #(32.0,"AUTO","1") AND_A (,IN0(in[127:96])..INl(in[95:64]), 
.IN2(in[63:32]),.IN3(in[31:0]),.Y(A)); 
and4 #( 8,0, "AUTO"," 1”) AND_B (.IN0( A[31:24]),.IN1( A[23:16J), 
,IN2( A[15:8]),.IN3( A[7:0]),.Y(B)); 
and4 #( 2,0,"AUTO","1") AND_C (.IN0( B[7:6]),.IN1( B[5:4]), 

.IN2( B[3:2]),.IN3( B[1:0]),.Y(C)); 
stdand2 AND_D (.IN0(C[0]),.INl(C[l]),.Y(out_unbuffered)); 
stdbuf #("15") OutputBuffer (.INO(out_unbuffered),.Y(out)); 



endmodule 



3. Codefile for Seven-to-128 

(dec7tol28e .codefile) 



Decoder 



// PLA TABLE for 7 to 128 decoder with enable 

// inO ini in2 in3 in4 in5 in6 EN 

00000001 //lineO 

00000011 //line 

00000101 //line 

00000111 //line 

00001001 //line 

00001011 //line 

00001101 //line 

00001111 //line 

00010001 //line 

00010011 //line 

00010101 //line 

00010111 //line 

00011001 //line 

00011011 //line 

00011101 //line 

00011111 //line 

00100001 //line 

00100011 //line 

00100101 //line 

00100111 //line 

00101001 //line 

00101011 //line 

00101101 //line 

00101111 //line 
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00110001 


//line 


00110011 


// line 


00110101 


// line 


001 101 11 


// line 


00111001 


//line 


00111011 


//line 


00111101 


//line 


00111111 


//line 


01000001 


//line 32 


01000011 


//line 


01000101 


//line 


01000111 


//line 


01001001 


//line 


01001011 


//line 


01001101 


//line 


01001111 


// line 


01010001 


// line 


01010011 


//line 


01010101 


//line 


01010111 


// line 


01011001 


//line 


01011011 


//line 


01011101 


//line 


01011111 


// line 


01100001 


//line 


01100011 


// line 


01100101 


//line 


01100111 


//line 


01101001 


//line 


01101011 


//line 


01101101 


//line 


01101111 


//line 


01110001 


//line 


01110011 


//line 


01110101 


//line 


01110111 


//line 


01111001 


//line 


01111011 


//line 


01111101 


//line 


01111111 


// line 


10000001 


// line 64 


10000011 


//line 


10000101 


//line 


11X1001 11 


//line 


10001001 


//line 


10001011 


// line 


10001101 


// line 


11X101111 


// line 


10010001 


//line 


10010011 


// line 
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10010101 


//line 


10010111 


//line 


10011001 


//line 


10011011 


//line 


10011101 


//line 


10011111 


//line 


10100001 


//line 


10100011 


//line 


10100101 


//line 


10100111 


//line 


10101001 


//line 


10101011 


// line 


10101101 


//line 


10101111 


//line 


10110001 


// line 


10110011 


//line 


10110101 


//line 


10110111 


//line 


10111001 


//line 


10111011 


//line 


10111101 


//line 


10111111 


//line 


11000001 


//line 


11000011 


//line 


11000101 


//line 


11000111 


// line 


11001001 


//line 


11001011 


//line 


11001101 


//line 


11001111 


//line 


11010001 


//line 


11010011 


//line 


11010101 


//line 


11010111 


//line 


1 101 1001 


//line 


11011011 


//line 


11011101 


//line 


11011111 


//line 


11100001 


// line 


11100011 


//line 


11100101 


//line 


11100111 


//line 


11101001 


//line 


11101011 


//line 


11101101 


//line 


11101111 


//line 


11110001 


//line 


11110011 


//line 


11110101 


//line 


11110111 


//line 



Seven -Output 



11111001 //line 

11111011 //line 

11111101 //line 

11111111 //line 128 

//END TABLE 



4 . One - Hundred - and - Twenty - E i ght - I npu t , 

Encoder, Priority to Low Bits 



128 TO 7 ENCODER, PRIORITY LOW 
Filename: encl28to71o.v 
Author: Joseph R. Robert Jr. 

Date: 21DEC95 
Revised: 13FEB96 

Purpose: This structural model is a 128-bit input, 7-bit output priority encoder. The highest priority is given to 
the bit with the lowest index. Inputs and outputs are active high. It is composed of four 32 to 5 priority encoders 
and the logic gates necessary to connect them together. 



module encl28to71o (I.A,ei,eo,gs); 

// epoch set_attribute FIXEDBLOCK = 1 

input [127:0] I; 
input ei; 
output [6:0] A; 
output gs,eo; 

wire [4:0] gOA,glA,g2A,g3A; 

wire g3eo,g2eo,g 1 eo,g3gs,g2gs,g 1 gs,g0gs,eo,gs; 

enc32to51o ENCg3 (I[ 127:96], g3A,g2eo, eo,g3gs); 
enc32to51o ENCg2 (I[ 95:64],g2A,gleo,g2eo,g2gs); 
enc32to51o ENCgl (I[ 63:32], glA,g0eo,gleo,glgs); 
enc32to51o ENCgO (I[ 31: 0],g0A, ei,g0eo.g0gs); 

//Group Select 

stdor4 OR_A (g3gs,g2gs,glgs,g0gs,gs); 

//A6 

stdor2 OR_B (g3gs,g2gs,A[6]); 

//A5 

stdor2 OR_C (g3gs,glgs,A[5]); 
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//A4 - AO 

stdor4 OR_D (g0A[4],glA[4],g2A[4],g3A[4],A[4]); 
stdor4 OR_E (gOA[3],glA[3],g2A[3],g3A[3],A[3]); 
stdor4 OR_F (gOA[2].gl A[2],g2A[2],g3A[2],A[2]); 
stdor4 OR_G ( g OA[l],glA[l],g2A[l],g3A[l],A[l]); 
stdor4 OR_H (gOA[0],gl A[0],g2A[0],g3A[0],A[0]); 

endmodule 



5. Thirty-Two -Input, Five-Output Encoder, Priority to 

Low Bits 



y^e * * * * * s}e * * * sfc * * * * * * s}c * * sjc * sfe sfc * sjc * * * * sfe * * * * * sje * * sjc * * * * sjc * * * * * sje sje sfc * sfc * * * sjc * * s}e * * * * sjc * * sje sjc s|e s|e sfc * sjc 

32 TO 5 ENCODER, PRIORITY LOW 
Filename: enc32to51o.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 13FEB96 

Purpose: This structural model is a 32-bit input, 5-bit output priority encoder. The highest priority is given to the 
bit with the lowest index. Inputs and outputs are active high. 

This module is a composed of four 8 to 3 priority encoders and the logic gates necessary to connect them 
together. This module is a building block for the 128 to 7 priority encoder. 

module enc32to51o (i,A,ei,eo,gs); 

// epoch set_attribute FIXEDBLOCK = 1 

input [31:0] i; 
input ei; 
output [4:0] A; 
output gs,eo; 

wire [2:0] gOA,glA,g2A,g3A; 

wire g3eo,g2eo,gleo,g3gs,g2gs,glgs,g0gs,eo,gs; 

enc8to31o ENCg3 (i[31:24],g3A,g2eo, eo,g3gs); 
enc8to31o ENCg2 (i[23:16],g2A,gleo,g2eo,g2gs); 
enc8to31o ENCgl (i[15: 8],glA,g0eo,gleo,glgs); 
enc8to31o ENCgO (i[ 7: 0],g0A, ei,g0eo,g0gs); 

//Group Select 

stdor4 OR_A (g3gs,g2gs,glgs,g0gs,gs); 

//A4 

stdor2 OR_B (g3gs,g2gs,A[4]); 
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//A 3 

stdor2 OR_C (g3gs,glgs,A[3]); 

//A2 - AO 

stdor4 OR_D ( g 0A[2],glA[2],g2A[2],g3A[2],A[2]); 
stdor4 OR_E ( g OA[l],glA[l],g2A[l],g3A[l],A[l]); 
stdor4 OR_F (g0A[0],glA[0],g2A[0],g3A[0].A[0]); 



endmodule 



6. Eight-Input, Three-Output Encoder, Priority to Low 
Bits 



8 TO 3 ENCODER, PRIORITY LOW 
Filename: enc8to31o.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 

Purpose: This structural model is an 8-bit input, 3-bit output priority encoder. The highest priority is given to the 
bit with the lowest index. Inputs and outputs are active high. 

Truth table 

Inputs Outputs 

El 17 16 15 14 13 12 II 10 A2A1AOGSEO 
Oxxxxxxxx 00000 
110000000 11110 
1x1000000 11010 

1 xx 100000 10110 

1 x x x 1 0 0 0 0 10010 

lxxxxlOOO 01110 
lxxxxx 100 01010 

lxxxxxxlO 00110 
lxxxxxxx 1 00010 

100000000 00001 



module enc8to31o (I,A,EIJEO,GS); 

// epoch set_attribute FIXEDBLOCK = 1 

input [7:0] I; 
input El; 
output [2:0] A; 
output GS,EO; 
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wire [7:0] I_; 
wire [2:0] A; 
wire EI_,GS,EO,EO_; 
supply! VDD; 

//Standard cell implemenation is more efficient here. See User Man. 5-34. 
inv #(8,0,”AUTO","1") IN VI (,IN0(I),.Y(IJ); 
stdinv INV_AA (,INO(EI),.Y(EIJ); 

//Enable Output 

Stdand4 AND_A (EU_[7],I_[6],I_[5],wl); 
stdand4 AND_B (I_[4],I_[3],I_[2],I_[l],w2); 
stdand3 AND_C (IJO],wl,w2,EO); 

//Group Select ("Got Something") 
stdnor2 NOR_D (EI_.EO.GS); 

//Encode A2 = EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_ + I6.I5_.I4_.I3_.I2_.I1_.I0_ + 
// I5.I4_.I3_.I2_.I1_.I0_ + I4.I3_.I2_.I1_.I0_) 

// = EI.(I7 .13_.I2_.I1_.I0_ + I6.I3_.I2_.I1_.I0_ + 

// I5.I3_.I2_.I1_.I0_ + I4.I3_.I2_.I1_.I0_) 

// = EI.I3 .I2_.I1_.I0_.(I7 + 16. + 15. + 14.) 

stdor4 OR_E (I[7],I[6],I[5],I[4].w5); 
stdand4 AND_F (EI.I_[3],I_[2],I_[l],w6); 
stdand3 AND_FA (I_[0],w5,w6,A[2]); 

//Encode A1 = EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_ + I6.I5_.I4_.I3_.I2_.I1_.I0_ + 
// I3.I2_.1 1_.I0_ + 12.11 _.I0_) 

// = EI.1 1 _.I0_.(I7.I5_.I4_. + I6.I5_.I4_ + 13. + 12) 



stdand3 AND_G (I[7],I_[5],I_[4],wlO); 
stdand3 AND_H (I[6],I_[5],I_[4],wll); 
stdor4 OR_I (wlO.wl l,I[3],I[2],wl2); 
stdand4 AND_J (EI,I_[ 1 ] ,I_[0], w 1 2, A [ 1 ]) ; 

//Encode AO = EI.(I7.I6_.I5_.I4_.I3_.I2_.I1_.I0_ + I5.I4_.I3_.I2_.I1_.I0_ + 

// I3.I2_.I1_.I0_ + I1.I0J 

// = EI.I0_.(I7.I6_.I4_.I2_ + 15 .14_.I2_ + I3.I2_ + 1 1) 

stdand4 AND_K (I[7].I_[6],I_[4],I_[2].wl5); 
stdand3 AND_L (I[5],I_[4].I_[2].wl6); 
stdand2 AND.M (I[3],I_[2],wl7); 
stdor4 OR_N (wl5,wl6,wl7,I[l],wl8); 
stdand3 AND_P (ELI_[0],wl8,A[0]); 

endmodule 
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Line Replacement Unit 



7 . 



LINE REPLACEMENT UNIT 
Filename: line_replacement_unit.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 13FEB96 

Purpose: This structural model determines the next line to be replaced whenever the PRC needs to start a new 
line. It fust selects invalid lines. If all the lines are valid, then it selects lines that have been "aged". A priority 
encoder (ENC_1) choses the line with the lowest index among all the lines that can be replaced. If all lines are 
valid, the encoder's output enable (oe) signal is used to cause aging. A line X can be replaced if the following holds 
true for that line: 

not (X=ActiveLine) AND {not Valid[X] OR (all_lines_valid AND Aged[X]) J 

Aging is accomplished by the use of a 7-bit counter (ager_counter), initially set to zero. When the 
cause_aging signal from the encoder is high, the counter advances. A decoder (DEC_B) output causes the 
appropriate Aged flag to be set. 

module line_replacement_unit( Valid, ActiveLine,all_lines_valid, 

new_replace,fetch_done,HRESET_,CLK,ReplaceLine); 

// epoch set_attribute FIXEDBLOCK = 1 

input [127:0] Valid; 
input [6:0] ActiveLine; 

input all_lines_valid,new_replace,fetch_done,HRESET_,CLK; 
output [6:0] ReplaceLine; 

supply 1 Vdd; 

wire [127:0] wl,w2,w4,w5,w6,w7,set_,reset_,Aged,fetch_donel28, 
all_lines_val id 1 28 ,HRESET 1 28_; 
wire [6:0] ager_line,ReplaceLine,HRESET7_; 
wire ager_en,cause_aging,latch_en,latch_en_buf,ncl ,nc2; 
split 128 fetch_done_split (fetch_done,fetch_donel28); 
splitl28 alv_split (all_lines_valid,alljines_validl28); 
splitl 28 HRESET_split (HRESET_,HRESET128_); 
split7 HRESET_split7 (HRESET_,HRESET7 J; 

decoder #(8,128,"verilog/dec7tol28e.codefile","2") 

DEC_A (.SEL({ Vdd,ActiveLine[0],ActiveLine[l],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5],ActiveLine[6] }), 

-Y(wl)); 

decoder Jnv #(8, 1 28," verilog/dec7to 1 28e.codefile","2") 

DEC_B (.SEL({ Vdd,ager_line[0],ager_line[l],ager_line[2], 

agerjine[3],ager_line[4],ager_line[5],ager_line[6] }),.YBAR(seO); 
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nand2#(128,0, M AUTO’7T')NAND_A (.IN0(wl),.INl(fetch_donel28),.Y(w2)); 
and2 #(128,0, "AUTO’VT') AND_E (.IN0(w2),.INl(HRESET128J,.Y(resetJ); 
srlatchl28 AgedReg (set_,reset_,Aged); 
scntr_c #(7A' , AUTO , 7 , r) 

ager_counter (.CLK(ager_en),.CLR(HRESET_),.EN(Vdd),.C0UT(nc2), 
•Q(ager_line)); 

stdand2 ANDJF (.INO(CLK),.INl(cause_aging),.Y(ager_en)); 
nand2 #(128,0,’ , AUTO , 7 , r) 

NAND.B ( .IN0(all_lines_validl28),. INI (Aged), .Y(w4)); 
and2 #( 128,0,” AUTO”, M 1 M ) AND_C (.IN0(w4),.IN 1( Valid),. Y(w5)); 
nor 2 #(128,0,"AUTO'7T') NOR_D (.INO(wl),.INl(w5),.Y(w6)); 
stdor2 OR_F (.INO(new_replace),.INl(cause_aging),.Y(latch_en)); 
stdbuf #("19") LatchEnableBuffer (.INO(latch_en),.Y(latch_en_buf)); 
encl28to71o ENC1 (.I(w7),.A(ReplaceLine),.ei(Vdd), 

.eo(cause_aging),.gs(ncl)); 
latch_c #(128,0, ,, AUTO", , 'l ") 

ReplaceLineLatch (.EN(latch_enJ}uf),.CLR(HRESETJ,.D(w6),.Q(w7)); 
endmodule 



8. OR Gate With 128 Inputs, One Output 



** * * 5§e sjc sjc ** 5|C * 5}C ije sic * * s}c sic * * ******* afe sje s)e * % * * sje sic* sjc sfcsfc s}e * * * * * * * * * 5}c * * * * * * *************** * 

128-INPUT OR GATE 
Filename: orl28.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 23JAN96 



s)c s(c s}c s}c s| <l* ***********************.*************************************** ******* j 



module orl28 (in,out); //An OR tree, equivalent to a 128-input OR gate, 
input [127:0] in; 
output out; 
wire out; 



wire [31:0] A; 
wire [7:0] B; 
wire [1:0] C; 

or4 #(32,0,"AUTO","1") OR_A (,IN0(in[l27:96]),.INl(in[95:64]), 
,IN2(in[63:32]),.IN3(in[3 1 :0]),.Y(A)); 
or4 #( 8,0,"AUTO",'T') OR_B (.IN0(A[3l:24]),.INl(A[23:16]), 
,IN2(A[15:8]),.IN3(A[7:0]),.Y(B)); 
or4 #( 2,0,"AUTO","1") OR_C (,IN0(B[7:6]),.IN1(B[5:4]), 
,IN2(B[3:2]),.IN3(B[1 :0]),.Y(C)); 
stdor2 OR_D (,IN0(C[l]),.INl(C[0]),.Y(out)); 
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endmodule 



9 . Predicted Memory Address List 



/**************************************************************************** 

PREDICTED MEMORY ADDRESS LIST 
Filename: predmajist.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 

Revised: 13FEB96 

Purpose: This structural model is a register file for storing predicted memory addresses. This list is composed of 
128 address registers, 128 equality comparators, and 128 Valid status flags. The NAR is stored in this list at the 
fetch_done pulse. If there is a match with the input address (in_addr), a priority encoder (ENC_C) determines 
which line matches. 

module predicted_ma_list (NAR,in_addr,ActiveLine,fetch_done,flush,HRESET_, 

Val id,match_line,match); 

// epoch set_attribute FIXEDBLOCK = 1 

input [26:0] NAR, in_addr; 
input [6:0] ActiveLine; 
input fetch_done,flush,HRESET_; 
output [127:0] Valid; 
output [6:0] matchjine: 
output match; 

wire [127:0] store_en,store_en_buf,flush_enable_,set_,reset_, 

Valid,equal,m,HRESET128_; 
wire ncl,nc2; 
supply 1 Vdd; 

split 128 hreset_splitter (.in(HRESET_),.out(HRESET128_)); 
decoder #(8,128,' , verilog/dec7tol28e.codefile”,”2 n ) 

DEC_A (.SEL({fetch_done.ActiveLine[0],ActiveLine[l],ActiveLine[2], 
ActiveLine[3],ActiveLine[4],ActiveLine[5].ActiveLine[6] }), 

.Y(store_en)); 

buff #(128,0,"AUTO ,, ,"8") StoreEnBuffer (.IN0(store_en)..Y(store__en_buf)); 
decoder Jnv #(8,128,”verilog/dec7tol28e.codefile”,"2") 

DEC_B (.SEL({ flush, ActiveLine[0],ActiveLine[l],ActiveLine[2], 
ActiveLine[3],ActiveLine[4].ActiveLine[5],ActiveLine[6] }), 

.YBAR(flush_enableJ); 

inv #( 1 28,0/' AUTO"," 1 ") INV_A (.IN0(store_en_buf),.Y(setJ); 
and2 #( 128,0,” AUTO" JT’) 

AND_B (.IN0(flush_enableJ,.INl(HRESET128_J,.Y(reset_)); 
srlatchl28 Validjatch (.S_(set_),.R_(reset_),.Q(Valid)); 
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and2 #(128,0, ,, AUTO , 7 , n 
AND_C (1N0( Valid), .INI (equal), .Y(m)); 

orl28 MATCH_OR (.in(m),.out(match)); 

encl28to71o ENC_C (.I(m),.A(matchJine),.ei(Vdd),.eo(ncl),.gs(nc2)); 
// epoch precompiled addre 

addre PredMAO (N AR,in_addr,store_en_buf[0], equal [0],HRESETJ; 
addre PredMAl (NAR,in_addr,store_en_buf[l],equal[l],HRESET_); 
addre PredMA2 (NAR,in_addr,store_en_buf[2],equal[2J,HRESET_); 
addre PredMA3 (NAR,in_addr,store_en_buf[3],equal[3],HRESET_); 
addre PredMA4 (NAR,in_addr,store_en_buf[4],equal[4],HRESET_); 
addre PredMA5 (NAR,in_addr,store_en_buf[5],equal[5],HRESET_); 
addre PredMA6 (NAR,in_addr,$tore_en_buf[6],equal[6],HRESET_); 
addre PredMA7 (NAR,in_addr,store_en_buf[7],equal[7],HRESET_); 
addre PredMA8 (NAR,in_addr,store_en_buf[8],equal[8],HRESET_); 
addre PredMA9 (NAR,in_addr,store_en_buf[9],equal[9],HRESET_); 
addre PredMAlO (NAR,in_addr,store_en_buf[10],equal[10],HRESET_J; 
addre PredMAl 1 (NAR,in_addr,store_en_buf[l l],equal[l 1],HRESET_); 
addre PredMA12 (NAR,in_addr,store_en_buf[12],equal[12],HRESET_); 
addre PredMA13 (NAR,in_addr,store_en_buf[13],equal[13],HRESET_); 
addre PredMA14 (NAR,m_addr,store_en_buf[14],equal[14],HRESET_); 
addre PredMA15 (NAR,in_addr,store_en_buf[ 15], equal [15],HRESETJ; 
addre PredMA16 (NAR,in_addr,$tore_en_buf[16],equal[16],HRESETJ; 
addre PredMA17 (NAR,in_addr,store_en_buf[17],equal[17],HRESET_); 
addre PredMA18 (NAR,in_addr,store_en_buf[18],equal[18],HRESET_); 
addre PredMA19 (NAR,in_addr,store_en_buf[19],equal[19],HRESET_); 
addre PredMA20 (NAR,in_addr,store_en_buf[20],equal[20],HRESETJ); 
addre PredMA21 (NAR,in_addr,store_en_buf[21],equal[21],HRESET_); 
addre PredMA22 (NAR,in_addr,store_en_buf[22],equal[22],HRESET_); 
addre PredMA23 (NAR,in_addr,store_en_buf[23],equal[23],HRESET_); 
addre PredMA24 (NAR,in_addr,storecn_buf[24],equal[24],HRESET_); 
addre PredMA25 (NAR,m_addr,store_en_buf[25],equal[25],HRESET_); 
addre PredMA26 (NAR,in_addr,store_en_buf[26],equal[26],HRESET_); 
addre PredMA27 (NAR,in_addr,store_en_buf[27],equal[27],HRESET_); 
addre PredMA28 (NAR,in_addr,store_en_buf[28],equal[28],HRESET_); 
addre PredMA29 (NAR,in_addr,store_en^buf[29],equal[29],HRESETJ; 
addre PredMA30 (NAR,in_addr,store_en_buf[30],equal[30],HRESET_J; 
addre PredMA31 (NAR,in_addr,store_en_buf[31],equal[31],HRESET_); 
addre PredMA32 (NAR,Ln_addr,store_en_buf[32],equal[32],HRESET_); 
addre PredMA33 (NAR,in_addr,store_en_buf[33],equal[33],HRESET_); 
addre PredMA34 (NAR,in_addr,$tore_en_buf[34],equal[34],HRESET_); 
addre PredMA35 (NAR,in_addr,store_en_buf[35],equal[35],HRESET_); 
addre PredMA36 (NAR,in_addr,store_en_buf[36],equal[36],HRESET_); 
addre PredMA37 (NAR,in_addr,storecn_buf[37],equal[37],HRESET_); 
addre PredMA38 (NAR,in_addr,store_en_buf[38],equal[38],HRESET_); 
addre PredMA39 (NAR,in_addr,storecn^buf[39],equal[39],HRESETJ; 
addre PredMA40 (NAR,in_addr,store_en_buf[40],equal[40],HRESET_J; 
addre PredMA41 (NAR,in_addr,store_en_buf[41],equal[41],HRESET_); 
addre PredMA42 (NAR,in_addr,storecn_buf[42],equal[42],HRESET_); 
addre PredMA43 (NAR,in_addr,store_en_buf[43],equal[43],HRESET_); 
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addre PredMA44 (NAR,in_addr,store_enJ)uf[44],equal[44],HRESETJ; 
addre PredMA45 (NAR,in_addr,store_en_buf[45],equal[45],HRESETJ; 
addre PredMA46 (NAR,in_addr,store_enj3uf[46],equal[46], HRESETJ; 
addre PredMA47 (NAR,in_addr,store_enj3uf[47],equal[47], HRESETJ; 
addre PredMA48 (NAR,in_addr,store_enJ}uf [48], equal [48], HRESETJ; 
addre PredMA49 (NAR,in_addr,store_en_buf[49],equal[49],HRESETJ; 
addre PredMA50 (NAR,in_addr,store_enJ}uf [50], equal [50], HRESETJ; 
addre PredMA51 (N AR, in_addr, st or e_enj}uf [51], equal [51], HRESETJ; 
addre PredMA52 (NAR,in_addr,store_en_buf[52],equal[52], HRESETJ; 
addre PredMA53 (NAR,in_addr,store_enJxif[53],equal[53], HRESETJ; 
addre PredMA54 (NAR,in_addr,store_en_buf[54],equal[54], HRESETJ; 
addre PredMA55 (NAR,m_addr,store_enJ)uf[55],equal[55], HRESETJ; 
addre PredMA56 (NAR,in_addr,store_enj3uf[56],equal[56],HRESETJ; 
addre PredMA57 (NAR,m_addr,store_enJ}uf[57],equai[57], HRESETJ; 
addre PredMA58 (NAR,in_addr,store_enj3uf[58],equal[58],HRESETJ; 
addre PredMA59 (NAR,in_addr,store_enJ}uf[59],equal[59],HRESETJ; 
addre PredM A60 (NAR, in_addr,store_en J^uf [60], equal [60],HRESETJ; 
addre PredMA61 (NAR,in_addr,store_enJ>uf[61],equal[61],HRESETJ; 
addre PredMA62 (NAR,in_addr,store_en_buf[62] , equal [62], HRESETJ; 
addre PredMA63 (NAR,in_addr,store_en_buf[63],equal[63],HRESET_); 
addre PredMA64 (NAR,in_addr,store_en_buf[64],equal[64],HRESET_j; 
addre PredMA65 (NAR,in_addr,store_enJ}uf[65],equal[65], HRESETJ; 
addre PredMA66 (NAR,in_addr,store_en_buf[66],equal[66], HRESETJ; 
addre PredMA67 (NAR,in_addr,store_en_buf[67],equal[67], HRESETJ; 
addre PredMA68 (N AR,m_addr,store_en_buf [68], equal [68], HRESETJ; 
addre PredMA69 (NAR,in_addr,store_en_buf[69],equal[69],HRESETJ; 
addre PredMA70 (NAR,in_addr,store_en_buf[70],equal[70],HRESET_J; 
addre PredMA71 (NAR,in_addr,store_en_buf[71],equal[71],HRESET_); 
addre PredMA72 (NAR,in_addr,store_en_buf[72],equal[72],HRESET_); 
addre PredMA73 (NAR,in_addr,store_en_buf[73],equal[73],HRESET_); 
addre PredMA74 (N AR,in_addr,store_enj3uf [74] ,equal[74], HRESETJ; 
addre PredMA75 (NAR,in_addr,store_enJ}uf[75],equal[75], HRESETJ; 
addre PredMA76 (NAR,in_addr,store_en_buf[76],equal[76],HRESETJ; 
addre PredMA77 (NAR,in_addr,store_en_buf[77],equal[77], HRESETJ; 
addre PredMA78 (NAR,in_addr,store_en_buf[78],equal[78],HRESETJ; 
addre PredMA79 (NAR,in_addr,store_en_buf[7 9], equal [79], HRESETJ; 
addre PredMA80 (NAR,in_addr,store_en_buf[80],equal[80],HRESETJ; 
addre PredMA81 (NAR,m_addr,store_en_buf[81],equal[81],HRESETJ; 
addre PredMA82 (NAR,in_addr,store_en_buf[82],equal[82], HRESETJ; 
addre PredMA83 (NAR,in_addr,store_en_buf [8 3], equal [83], HRESETJ; 
addre PredMA84 (NAR,in_addr,store_en_buf[84], equal [84], HRESETJ; 
addre PredMA85 (NAR,in_addr,store_en_buf[85],equal[85],HRESETJ; 
addre PredMA86 (NAR,m_addr,store_en_buf[86],equal[86],HRESETJ; 
addre PredMA87 (NAR,in_addr,store_en_buf [87], equal [87], HRESETJ; 
addre PredMA88 (NAR,Ln_addr,store_en_buf[88],equal[88],HRESETJ; 
addre PredMA89 (NAR,in_addr,store_en_buf[89],equal[89], HRESETJ; 
addre PredMA90 (NAR,in_addr,store_en_buf[90],equal[90],HRESETJ; 
addre PredMA91 (NAR,in_addr,store_enJ}uf[91],equal[91],HRESETJ; 
addre PredMA92 (NAR,in_addr,store_en_buf[92],equal[92], HRESETJ; 
addre PredMA93 (NAR,in_addr,store_en_buf[93],equal[93], HRESETJ; 
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addre PredMA94 (NAR,in_addr,store_en_buf[94],equal[94].HRESET_); 
addre F*redMA95 (NAR,in_addr,store_en_buf[95],equal[95].HRESET_); 
addre PredMA96 (NAR.in_addr,store_en_buf[96],equal[96].HRESET_); 
addre PredMA97 ( N AR, ui_addr,store_en_buf [97] .equal f 97 ],HRESET_); 
addre PredMA98 (NAR,in_addr,store_en_buf[98],equaI[98],HRESET_); 
addre PredMA99 (NAR,in_addr,store_en_buf[99],equal[99],HRESET_); 
addre PredMAlOO (NAR,in_addr,store_en_buf[100].equal[100],HRESETJ; 
addre PredMAlOl (NAR,in_addr,store_en_buf[101],equal[101],HRESET_); 
addre PredMA102 (NAR,in_addr,store_en_buf[102],equal[102],HRESET_); 
addre PredMA103 (NAR,in_addr,store_en_buf[103],equal[103],HRESET_); 
addre PredMA104 (NAR,in_addr,store_en_buf[104],equal[104],HRESET_); 
addre PredMA105 (NAR,in_addr,store_en_buf[105],equal[105],HRESET_); 
addre PredMA106 (NAR,in_addr,store_en_buf[106],equal[106],HRESET_); 
addre PredMA107 (NAR,in_addr,store_en_buf[107],equal[107].HRESET_); 
addre PredMA108 (NAR,in_addr,store_en_buf[108],equal[108],HRESET_); 
addre PredMA109 (NAR,in_addr,store_en_buf[109],equai[109],HRESETJ; 
addre PredMAl 10 (NAR,in_addr,store_en_buf[l 10],equal[l 10],HRESET_); 
addre PredMAl 1 1 (NAR,in_addr,store_en_buf[l 1 l],equal[l 1 1],HRESET_); 
addre PredMAU2 (NAR,in_addr,store_en_buf[l 12],equal[l 12],HRESET_); 
addre PredMAl 13 (NAR,in_addr,store_en_buf[l 13],equal[l 13],HRESET_); 
addre PredMAl 14 (NAR,in_addr,store_en_buf[ 1 14],equal[l 14],HRESET_); 
addre PredMAl 15 (NAR,in_addr,store_en_buf[l 1 5],equal[l 15],HRESET_); 
addre PredMA116 (NAR,in_addr,store_en_buf[116].equal[116],HRESETJ; 
addre PredMAl 17 (NAR,in_addr,store_en_buf[117],equal[l 17],HRESET_); 
addre PredMAl 18 (NAR,in_addr,store_en_buf[118],equal[118],HRESET_); 
addre PredMAl 19 (NAR,in_addr,store_en_buf[119].equal[l 19],HRESET_); 
addre PredMA120 (NAR,in_addr,store_en_buf[120],equal[120],HRESET_); 
addre PredMA 12 1 (NAR,in_addr,store_en_buf[ 12 l],equal[l 21],HRESET_): 
addre PredMA 122 (NAR.in_addr,store_en_buf[ 122], equal [122],HRESET_): 
addre PredMA123 (NAR,in_addr,store_en_buf[123],equal[123],HRESET_); 
addre PredMA124 (NAR,in_addr,store_en_buf[124],equal[124],HRESET_); 
addre PredMA125 (NAR,in_addr,store_en_buf[125],equal[125],HRESET_); 
addre PredMA126 (NAR,in_addr,store_en_buf[126],equal[126],HRESET_): 
addre PredMA127 (NAR,in_addr,store_en_buf[127],equal[127],HRESET_); 

endmodule 



10. One-to-128 Wire Splitter 



I * * * * * * * * * * * * * * * * * * * ** * * * ****** * * * ** * * 

1 TO 128 WIRE SPLITTER 






Filename: splitl28.v 
Author: Joseph R. Robert, Jr. 
Date: 21DEC95 
Revised: 23JAN96 



Purpose: Splits a wire into 128 wires. 
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* ********* ************* He********* ********************************Hc**********y 

module split 1 28 (in,out); //Splits a wire into 128 wires. 

input in; 

output [127:0] out; 

assign out = {in,in,in,in,in,in 4 n,in,in,in,in,in,in ? in,in,in. 
in.in,in,in,in,in,in,in,in,in,in,in,in,in,in,in, 
in,in,in,m,in,in,in,in,in,in,in,in,in,in,in,in, 
m,in,in,iii,in,in,in,m,in,in,m,in,in,in,in,in, 
in,in,in,in,in,in,in,in,in,in,in,in,in ? in,in,in, 
in,in,in,in,in,in,in,in,in,in,in,in,in,in,in,in, 
in , in , in , in , in , in , in , in, in , in , in , in , in , in , in , in , 
in,in,in4min,in,in,in,in,in,in,in,in,in,in,in } ; 
endmodule 



11. One-to-Seven Wire Splitter 



^* ************************************************************** ************* 

1 TO 7 WIRE SPLITTER 
Filename: split7.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 23JAN96 

Purpose: Splits a wire into 7 wires. 

****************************************************************************/ 



module split7 (in,out); //Splits a wire into 7 wires, 
input in; 

output [6:0] out; 

assign out = { in,in,in,in,in,in,in }; 
endmodule 



12. Set, Reset Latch 



^* *************************************************************************** 
STANDARD SET,RESET LATCH 
Filename: srlatch.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 23JAN96 
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Reset has priority. 



module srlatch (S_, R_, Q); 
input S_,R_; 
output Q; 
wire wl,w2.Q,Q_; 

stdnand2 NAND_A (,IN0(w2),.INl(QJ,.Y(Q)); 
stdnand2 NAND_B (,IN0(RJ,.IN1(Q),. Y(QJ); 
stdnand2 NAND_C (,IN0(wl),.INl(R_),.Y(w2)); 
stdinv INV_D (,IN0(S_),.Y(wl)); 

endmodule 



13. Set, Reset Latch Array 128 Bits Wide 



j jfc sic*** He * 9fc *** Sic tfc******************* sje * 9fC 3{c 3(c 3fC 3fc * 3fC S|C * * * * * * * SfC }jc 3fc * 3fC jfc * * 5jC * * * sjc * * * * * * sjc * * * * 5)C * * 

ARRAY OF 128 SET,RESET LATCHES 
Filename: srlatchl28.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
Revised: 23JAN96 



module srlatch 128 (S_, R_, Q); //set-reset latch 
input [127:0] S_,R_; 
output [127:0] Q; 
wire [127:0] wl,w2,Q,Q_; 



nand2 #(128,0, ,, AUTO , 7T') NAND_A (,IN0(w2),.INl(QJ,.Y(Q)); 
nand2 #(128A M AUTO ,, , M l M ) NAND_B (.IN0(RJ,.IN1(Q),.Y(QJ); 
nand2 #(128,0, ,, AUTO ,, , , T’) NAND_C (.IN0(wl),.INl(RJ,.Y(w2)); 
inv #(128,0, ,, AUTO M , ,, r') INV_C (.IN0(SJ,.Y(wl)); 



endmodule 



E. PREDICTOR 



***************************************************************************** 

PREDICTOR 
Filename: predictor.v 
Author: Joseph R. Robert, Jr. 

Date: 21DEC95 
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Revised: 06FEB96 



Purpose: This module calculates the Next Address (stored in NAR) based on die Most Recent Memory Access 
(MRMA) and the Current Address (in the CAR). The predicuon calculation is 

NAR = 2*C AR - MRMA 

hi this structural implementation of the Predictor, the predict signal latches in the CAR and MRMA inputs. The 
subtraction is accomplished as a 2's compliment addition with a high speed adder. 

The CAR is multiplied times 2 by concatenating a zero at the least significant end. The most significant bit of the 
CAR is not retained, since it will not have an effect on the 27 -bit output of the adder. This would adversely affect 
address prediction only around the mid-point of the 4 gigabytes of memory. The Golden Rule here is "Design for 
the common case." 

A number is negated in 2's compliment by inverting all the bits and adding 1. The MRMA is negated by inverting 
all its bits. Adding the required 1 is implemented as a Carry-In to the adder. 

Epoch's TACTIC reported the propagation delay from predict to NAR to be 4.90 ns. 






* * * * * * * i 



module predictor (MRMA,CAR,predict,NAR,HRESETJ; 

//CAR is [30:5] of 32-bit address 
//MRMA and NAR are [31:5] of 32-bit address 

// epoch set_attribute FIXEDBLOCK = 1 

input [26:0] MRMA; 
input [25:0] CAR; 
input predict,HRESET_; 
output [26:0] NAR; 

wire [26:0] NAR,AJB,C; 
wire nc; 

"define group "predictor" 

supplyO gnd; 
supply 1 vdd; 

assign A[0] = gnd; 
dff_c #(26,l,'group,’T’) 

CARJatch (.D(CAR),.CLK(predict),.CLR(HRESETJ,.Q(A[26:l])); 
dff_c #(27,1, "group," I") 

MRMAJatch (.D(MRMA),.CLK(predict),.CLR(HRESETJ..Q(C)); 

bufinv #(27, If gr o up, "1"," speed") 

bit_complement (.IN()(C),.Y(B)); 
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addhs #(27, 1, 'group, "1”) 

adder (.A(A),.B(B),.CIN(vdd),.COUT(nc),.SUM(NAR)); 
endmodule 



F . DATA LIST 



DATA LIST 
Filename: datalist.v 
Author: Joseph R. Robert, Jr. 

Date: 15DEC95 

Revised: 07FEB96 

Purpose: This module stores the data retreived from memory in anticipation of a request by the CPU. 

The basic memory cell is Epoch's hsramoe (high speed ram with output enable). Since each hsram has 
a maximum word size of 128 bits, there are two hsram parts in parallel to get the required 256-bit width. 

An upload signal causes the Data List to store the data on data_line into the address specified by 
ActiveLine. The input upload has to be inverted to match the active-low WR input of the Epoch hsram component. 

A download signal causes the Data List to assert onto datajine the data in the address specified by 
ActiveLine. This signal also has to be inverted for the same reason. 

Both the inverters can probably be removed if the Bus Interface Unit makes the upload and download 
signals active low. That could only improve the response time of this data memory. 

Epoch calculated the following timing delays: 

download -> hsramoe.DOUT 2.3 ns 
ActiveLine -> hsramoe.DOUT 7.3 ns 

A design alternative is to use the regular speed version, ramoe, with the following timing delays. 

download -> ramoe.DOUT 4 ns 
ActiveLine -> ramoe.DOUT 16 ns 

Using this slower RAM is possible, but would require a significant modification to the PRC behavior to handle to 
longer delay, and would add a cycle delay to CPU reads when there is a hit in the PRC. 

Putting this module's VerilogOut file into the original PRC behavioral model for mixed-mode simulation 
caused a timing error that had to be corrected in the Bus Interface Unit. After an upload to the DataList. datajine 
must remain valid for long enough to meet the data hold time requirement of Epoch’s hsramoe. 



module datalist (dataJine,ActiveLine,upload,download); 
//epoch set_attribute FIXEDBLOCK = 1 

input [6:0] ActiveLine: 
input upload,download; 
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inout [255:0] datajine; 



wire [255:0] datajine: 
wire write_,enable_; 

//STRUCTURE 

stdbufinv upload_inv (.INO(upload),.Y(write_)); 
stdbufinv download_inv (.INO(download),.Y(enableJ); 
hsramoe #(128, 128,7,32, 1,’T") 

data_raml (.A(ActiveLine),.DIN(dataJine[127:0]),.DOUT(dataJine[127:0]), 
.WR( write J,.OE(enableJ); 
hsramoe #(128, 128,7,32, 1,’T’) 

data_ramO (.A(ActiveLine),.DIN(dataJine[255: 128]),.DOUT(dataJine[255: 128]), 
.WR(writeJ,.OE(enableJ); 

endmodule 



G . BUS INTERFACE 



* BUS INTERFACE UNIT 

* Filename: busjnterface.v 

* Author: Joseph R. Robert, Jr. 

* Date: 09OCT95 

* Revised: 20MAR96 

* 

Purpose: This module connects the PRC with the system bus. It handles the protocol of data transfer in and out 
of the PRC. 

When this module receives a fetch signal, it latches the address in the NAR, and requests the bus for a 
burst read. It stores the incoming data until all four bursts have been received. Then it uploads the data into the 
Data list and assserts fetch_done. If there is a parity error during the fetch, the Bus Interface informs the Controller 
by asserting fetch_abort, and the transaction is cancelled. 

When this module receives a send signal, it sends a cancel signal (CANX) to the memory module, 
downloads data from the Data List, and then sends the data to the CPU. When the transfer is finished, it asserts 
send_done. 

The coordination of these activities is accomplished through the use of two Finite State Machines. One 

acts as an address bus master, and the other controls the flow of data. 

* 

module busjnterface (NARJN,BURSTSTART,BG_,AACK_,DBG_,send,fetch, 
clk,BR_,upload,download,fetch_done,fetch_abort, 
send_done,CANX,snoopJgnore,DATALINE,D,A,AP,DP,DPE_, 
TT,TSIZ,TC,ABB_,TS_,TBST_,DBB_,TA_,HRESETJ; 
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//epoch set_attribute FIXEDBLOCK = 1 

// Signals are defined in system .v. 

input [26:0] NARJN; 

input [1:0] BURSTSTART; 

input BG_,AACK_,DBG_,send,fetch,clk,HRESET_; 

output BR_,upload,download,fetch_doneJetch_abort; 

output send_done,DPE_,CANX,snoop_ignore; 

inout [255:0] DATALINE; 

inout [63:0] D; 

inout [31:0] A; 

inout [7:0] DP; 

inout [4:0] TT; 

inout [3:0] AP; 

inout [2:0] TSIZ; 

inout [1:0] TC; 

mout ABB_,TS_,TBST_,DBB_,TA_; 

tri [255:0] DATALINE; 

tri [63:0] D; 

tri [31:0] A; 

tri [7:0] DP; 

tri [4:0] TT; 

tri [3:0] AP; 

tri [2:0] TSIZ; 

tri [1:0] TC; 

tri ABB_,TS_,TBST_,DBB_,TA_,DPE_; 

supplyl VDD; 
supplyO GND; 

//Address section wires 
wire [26:0] a_regJMAR; 
wire [3:0] ap_reg,addr_parity__gen; 
wire qual_BG_; 

//Data section wires 

wire [255:0] data,mux_out; 

wire [31:0] dparity,dparity_gen; 

//wire [3:0] dreg_clk; 
wire [1:0] burst_start; 

wire bs_clk,dregO_clk,dregl_clk,dreg2_clk4reg3_clk,data_parity_error,qual_DBG. 
dregO_clk_buf 7 dregl_clk_buf,dreg2_clk_buf > dreg3_clk_bufa_en_buf_,CANX, 
dataline_en_buf_,d_on0_buf_,d_enl_buf_,d_en2_buf_,d_on3_buf_,ta, 
Iatch0_delay,latchl_delay,latch2_delay,latch3_delay; 

//ADDRESS BUS INTERFACE 

assign qual_BG_ = ~(ABB_ &. !BG_); 
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// Next Address Register 
dff #(27,0,"AUTO","AUTO") 

NextAddressReg (.CLK(NARLatch),.D(NAR_IN),.Q(NAR)); 

//Generate address parity. 
parityo_gen32 

AddrParityGen (.D({NAE,GND,GND,GND,GND,GND}),.PGEN(addr_panty_gen)); 

//Address Output Registers and buffers 
dff #(27,0,"AUTO7’AUTO") 

AddressReg (.CLK(a_latch),.D(NAR),.Q(a_reg)); 
dff #(4,0,"AUTO Y AUTO”) 

AddrParReg (.CLK(a_latch),.D(addr_parity_gen),.Q(ap_reg)); 
tribuf #(32,0, "AUTO", "AUTO") 

a.buffer (.EN(a_en_buf_),.INO({a_reg,GND,GND,GND,GND,GND }),.Y(A)); 
stdbuf #("9") AEN_BUF (.INO(a_enJ,.Y(a_en_bufJ); 
tribuf #(4,0," AUTO", "AUTO") 
ap_buffer (.EN(a_en_),.INO(ap_reg),.Y(AP)); 
tribuf #(5,0,” AUTO"," AUTO") 

tt .buffer (.EN(a_en J,.INO( { GND, VDD, VDD, VDD,GND }),. Y(TT)); 
tribuf #(3,0,"AUTO","AUTO") 

tsize.buffer (.EN(a_en J,.IN0({ GND,VDD,GND }),.Y(TSIZ)); 
tribuf #(2,0, "AUTO", "AUTO") 
tcode_buffer (.EN(a_en J,.IN0({ GND,GND }),.Y(TC)); 

stdtribuf abb_buffer (.EN(abb_en_),.INO(abb_reg_),.Y(ABB_)); 
stdtribuf tbst_buffer (.EN(tbst_en J,.IN0(GND),. Y(TBSTJ); 
stdtribuf ts_buffer (.EN(ts_en_),.INO(ts_reg_),.Y(TS_)); 



//ADDRESS FINITE STATE MACHINE 



parameter // epoch enum astat 
A.IDLE = 3'dO, 

WAIT_FOR_BG = 3’dl, 

MASTER = 3’d2, 

TRANSFER = 3'd3, 

WAIT_FOR_AACK = 3'd4, 

TERMINATION = 3’d5, 

WAIT_FOR_NOT_FETCH = 3'd6, 
dc_astate - 3’bxx; 

reg [2:0] /* epoch enum astat */ astate, next_astate; 
reg a_latch,a_en_,abb_reg_,abb_en_,BR_,NARLatch,snoop_ignore, 
tbs t_en_, ts_reg_, ts_en_; 

always @(posedge elk or negedge HRESET J 
begin 

if (! HRESET J 
astate = A_IDLE; 
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else 

astate = next_astate; 
end 

always @(astate or fetch or qual_B(3_ or AACK J 
begin 

//default values 

ajatch = 1’bO; 

a_en_ = l'bl; 

abb_reg_ = l'bl; 
abb_en_ = l'bl; 

BR_ = l’bl; 

NARLatch = lbO; 
snoop_ignore = 1’bO; 
tbst_en_ = l'bl; 
ts_reg_ = l'bl; 

ts_en_ - l'bl; 

case (astate) 

AJDLE: 
begin 
if (fetch) 

next_astate = WAIT_FOR_BG; 
else next_astate = A_IDLE; 
end 

WAIT_FOR_BG: 

begin 

BR_ = 1'bO; // Request the bus. 

NARLatch = l'bl; // Latch the Next Address, 
if (qual_BG_ = 1'bO) 
next_astate = MASTER; 
else next_astate = WAIT_FOR_BG; 
end 

MASTER: 

begin 

ajatch = l’bl; // Latch transfer attributes. 
a_en_ = 1’bO; // Enable attribute outputs. 
abb_reg_ = 1’bO; //Take the address bus. 
abb_en_ = 1’bO; 

snoop Jgnore = l'bl; //Tell snooper to ignore this transaction. 
tbst_en_ = 1’bO; //Another transfer attribute. 
ts_reg_ = 1’bO; //Start the transfer. 

ts_en_ = 1’bO; 

next_astate = TRANSFER; 
end 
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TRANSFER: 

begin 

a_en_ = 1’bO; 
abb_reg_ = 1'bO; 
abb_en_ = 1'bO; 
snoop_ignore = l'bl: 
tbst_en_ = 1’bO; 
ts_reg_ - l’bl; 

ts_en_ = 1’bO; 

if (AACK_ = l’bl) 
next_astate = WAIT_FOR_AACK; 
else next_astate = TERMINATION; 
end 

WAITJFOR_AACK: 

begin 

a_en_ = 1'bO; 
abb_reg_ = 1’bO; 
abb_en_ = 1'bO; 
snoop_ignore = l'bl; 
tbst_en_ = 1'bO; 
if (AACK_= l’bl) 
next_astate = W AIT_F O R_ A ACK ; 
else next_astate = TERMINATION; 
end 

TERMINATION: 

begin 

abb_reg_ = l'bl; // Relinquish the address bus. 
abb_en_ = 1’bO: 

next.astate = WAIT_FOR_NOT_FETCH; 
end 

WAIT_FOR_NOTJFETCH: 

begin 

if (fetch = l'bl) 

next_astate = WAIT_FOR_NOT_FETCH; 
else next_astate = AJODLE; 
end 

default: 

begin 

next_astate = dc_astate; 
end 

endcase 

end 



//DATA BUS INTERFACE 
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assign qual_DBG_= ~(DBB_ &!DBG_); 



// burst_start latch 

stdand2 BS_AND (,INO(send),.INl(clk),.Y(bs_clk)); 
dff #(2,0," AUTO", "AUTO") 

BurstStartReg (.CLK(bs_clk),.D(BURSTSTART),.Q(burst_start)); 

// Odd Parity Generator/Checker 
// epoch precompiled parityo_chkgen256 
parityo_chkgen256 DataParityGen 

(.D(data),.PIN(dparity),.ERROR(data_p^irity_error),.PGEN(dparity_gen)); 
assign DPE_ - ~data_panty_error; 

//data registers 

stdbufinv TA_INV (,INO(TAJ,.Y(ta)); 

//Delay buffer required for timing of latch signals. Gates = 4 results in smallest layout area, 
stddelaybuf #(l,4,"AUTO") LatchDelayO(.INO(latchO),.Y(latchO_delay)); 
stddelaybuf#( 1,4, "AUTO") LatchDelayl(.INO(latchl),.Y(latchl_delay)); 
stddelaybuf #( 1 ,4," AUTO") LatchDelay2(.IN0(latch2),.Y(latch2_delay)); 
stddelaybuf #( 1,4," AUTO") LatchDelay3(.IN0(latch3),.Y(latch3_delay)); 
stdand3 #("CRITICAL") 

DR0_AND (.IN0(clk),.INl(latch0_delay),.IN2(ta),.Y(dreg0_clk)); 
stdand3 #("CRITICAL") 

DR1.AND (JN0(clk),.INl(latchl_delay),JN2(ta),.Y(dregl_clk)); 
stdand3 #("CRITICAL") 

DR2.AND (,IN0(clk),.INl(latch2_delay),.IN2(ta),.Y(dreg2_clk)); 
stdand3 #("CRITICAL") 

DR3_AND (.IN0(clk),.INl(latch3_delay),.IN2(ta),.Y(dreg3_clk)); 
stdbuf #(”CRITICAL") DR0_BUF (.INO(dregO_clk),.Y(dregO_clk_buf)); 
stdbuf #("CRITICAL") DR1_BUF (.INO(dregl_clk),.Y(dregl_clk_buf)); 
stdbuf #("CRITICAL") DR2_BUF (.IN0(dreg2_clk),.Y(dreg2_clk_buf)); 
stdbuf #("CRITICAL") DR3_BUF (.IN0(dreg3_clk)..Y(dreg3_clk_buf)); 
dff #(72,0,"AUTO","AUTO") 

DataRegO(.CLK(dregO_dk_buf)..D({mux_out[ 63: 0],DP}), 

.Q«data[ 63: 0],dparity[ 7: 0] })); 
dff #(72,0,"AUTO","AUTO") 

DataRegl (.CLK(dregl_dk_buf)>.D([mux_out[127: 64],DP)), 

.Q({data[127: 64],dparity[15: 8)})); 
dff #(72,0,"AUTO",”AUTO") 

DataReg2 (,CLK(dreg2_clk_buf),.D( ( mux_out[ 191:1 28], DP } ), 

,Q({ data[ 1 9 1 : 1 28],dparity [23: 16]})); 
dff #(72,0,"AUTO",”AUTO") 

DataReg3 (,CLK(dreg3_clk_buf),.D( { mux.out [255: 1 92] ,DP } ), 

,Q({ data[255: 192] ,dparity [31:24]])); 

//multiplexer 

mux2 #(128,0,"AUTO","AUTO") 

MUXA (,IN0({D.D}),.IN1(DATALINE[127: 0]),.S0(mux_sel),.Y(mux_out[127: 0])); 
mux2 #( 128,0," AUTO”," AUTO") 

MUXB(.IN0({D,D)),.INl(DATALINE[255:128]),.S0(mux_sel),.Y(mux_out[255:128])); 
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//dataline output buffer 

stdbuf DATALINEJEN_BUFFER ( .]NO(dataline_en_),.Y(dataline_en_buf_)) 
tribuf #(128,0,"AUTO"." AUTO") 

dataline_bufferA (.EN(dataline_en_buf_),.IN0(data[127: 0]), 
,Y(DATALINE[127: 0])); 
tribuf #( 128,0," AUTO"," AUTO”) 

dataline_bufferB (JEN(datalme_en_buf_),.IN0(data[255:128]), 
,Y(DATALENE[255:128])); 

//data output buffers 
tribuf #(64,0,"AUTO","AUTO") 
data.bufferO (.EN(d_enO_buf_),.INO(data[ 63: 0])..Y(D)); 
tribuf #(64,0,"AUTO"," AUTO") 
data.bufferl (,EN(d_enl_buf_),.IN0(data[127: 64]),.Y(D)); 
tribuf #(64,0,"AUTO",” AUTO") 
data_buffer2(.EN(d_en2_buf_),.IN0(data[191:128]),.Y(D)); 
tribuf #(64,0," AUTO"," AUTO”) 
data_buffer3 (.EN(d_en3_buf J,.IN0(data[255: 1 92]),.Y(D)); 

stdbuf DENO.BUF (,INO(d_enO J,.Y(d_enO_buf J); 
stdbuf DEN 1_BUF (JNO(d_enl_),.Y(d_enl_buf_)); 
stdbuf DEN2.BUF (.IN0(d_en2 J,.Y(d_en2_buf J); 
stdbuf DEN3_BUF (.IN0(d_en3_),.Y(d_en3_buf_)); 

tribuf #(8.0,” AUTO"," AUTO") 

dparity_bufferO (.EN(d_enO_),.INO(dparity_gen[ 7: 0]),.Y(DP)); 
tribuf #(8.0." AUTO",''AUTO") 

dparity_bufferl (,EN(d_enl_),.IN0(dparity_gen[15: 8]),.Y(DP)); 
tribuf #(8,0,"AUTO","AUTO") 

dparity_buffer2 (,EN(d_en2_),.IN0(dparity_gen[23:16]),.Y(DP)); 
tribuf #(8.0."AUTO”, ,, AUTO”) 

dparity_buffer3 (.EN(d_en3_),.IN0(dparity_gen[31:24]),.Y(DP)); 

stdtribuf dbb_buffer (,EN(dbb_en_),.INO(dbb_reg_),.Y(DBB_)); 
stdtribuf ta_buffer (.EN(ta_en_),.INO(GND),.Y(TA_)); 
stdbuf #("26") CANX.BUF (,INO(cancel),.Y(CANX)); 

//DATA FINITE STATE MACHINE 

parameter // epoch enum dstat 
D_IDLE = 5’dO, 

WAIT_FOR_DBG =5’dl, 

FIRST_BEAT = 5'd2, 

SECOND.BEAT = 5'd3, 

THIRD.BEAT = 5’d4, 

FOURTH_BEAT = 5'd5, 

FETCHJTERMINATE = 5'd6. 

UPLOAD 1 = 5'd7, 

ABORT1 = 5'd8, 

D_WAIT_FOR_NOT_FETCH_A = 5'd9, 
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D_W AITJFOR_N OT_FETCH_B = 5‘dlO, 

START_SEND = 5'dl2, 

SENDOO = 5'dl3, 

SENDOl = 5’dl4, 

SEND02 = 5'dl5, 

SEND03 = 5'dl6, 

SEND 10 = 5'dl7, 

SEND11 = 5 r dl8, 

SEND 12 = 5’dl9, 

SEND 13 = 5'd20, 

SEND20 = 5'd21, 

SEND21 = 5'd22, 

SEND22 = 5'd23, 

SEND23 = 5'd24, 

SEND30 = 5'd25, 

SEND31 = 5'd26, 

SEND32 = 5'd27, 

SEND33 = 5'd28, 

SEND_TERMLNATE = 5’d29, 
dc_dstate = 5'bxx; 

reg [4:0] /* epoch enum dstat */ dstate, next_dstate; 

reg cancel,dbb_reg_,dbb_en_,dataline_en_,d_en0_,d_enl_,d_en2_,d_en3_, 
download,fetch_done, 

fetch_abort,latchO,latchl,latch2,Iatch3,mux_sel,send_done,upload,ta_en_; 

always @(posedge elk or negedge HRESETJ 
begin 

if (! HRESETJ 
dstate = D_IDLE; 
else 

dstate = next_dstate; 
end 

always @(dstate or fetch or send or qual_DBG_ or TA_ or 
data_parity_error or burst_staxt) 
begin 

//default values 

cancel = 1 'b0; 
dbb_reg_ = l’bl; 
dbb_en_ = l'bl; 
dataline_en_ = l'bl; 
d_en0_ = l'bl; 
d_en 1 _ = l'bl; 

d_en2_ = l’bl; 
d_en3_ = l’bl; 
download = 1’bO; 
fetch_done = 1'bO; 
fetch_abort = 1’bO; 



175 



IatchO = 1’bO; 

latch 1 = 1'bO; 

latch2 = 1’bO; 

latch3 = 1'bO; 

mux_sel = 1'bO; 
send_done = 1'bO; 
ta_en_ = l’bl; 

upload = 1’bO; 

case (dstate) 

DJDLE: 
begin 
if (fetch) 

next_dstate = WAIT_FOR_DBG; 
else if (send) next_dstate = START_SEKD; 
else next_dstate = D_IDLE; 
end 

WAIT_FOR_DBG: 

begin 

if (qual_DBG_=l’bO) 
next_dstate = FIRST_BEAT; 
else next_dstate = WAIT_FOR_DBG; 
end 

FIRST_BEAT: 

begin 

dbb_reg_= 1’bO; 
dbb_en_ = 1'bO; 

IatchO = l’bl; 
if (TA_— l'bl) 
next_dstate = FIRST_BEAT; 
else next_dstate = SECOND_BEAT; 
end 

SECOND.BEAT: 

begin 

dbb_reg_ = 1’bO; 
dbb_en_ = 1'bO; 
latch 1 = l’bl; 
if (TA_= l’bl) 
next_dstate = SECOND_BEAT; 
else next_dstate = THIRD_BEAT; 
end 

THIRD_BEAT: 

begin 

dbb_reg_= 1’bO; 
dbb_en_ = 1'bO; 
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Iatch2= lbl; 
if (TA_ = l’b 1) 
next_dstate = THIRD_BEAT; 
else next_dstate = FOURTHJBEAT; 
end 

FOURTHJBEAT: 

begin 

dbb_reg_= 1’bO; 
dbb_en_ = 1'bO; 
latch3 = l’bl; 
if (TA_= lbl) 
next_dstate = FOURTHJBEAT; 
else next_dstate = FETCH JIERM1NATE; 
end 

FETCH_TERMINATE: 

begin 

dbb_reg_= l’b 1; 
dbb_en_ = 1’bO; 
if (data_parity__error == l’b 1) 
next_dstate = ABORT1; 
else next_dstate = UPLOAD 1; 
end 

UPLOAD 1: 
begin 

dataline_en_ = 1’bO; 
fetch_done = l’bl; 
upload = l’bl; 

next_dstate = D_W A I T_F O R_N OT_FETC H_A ; 
end 

ABORT1: 

begin 

fetch_abort = l’bl; 

next_dstate = D_WAIT_FOR_NOT_FETCH_B; 

end 

D_WAIT_FOR_NOT_FETCH_A: 

begin 

dataline_en_ = 1’bO; // To meet data hold requirements of hsram 
//in Data List. 
fetch_done = l’bl; 
if (fetch == l bl) 

next_dstate = D_WAIT_FOR_NOT_FETCH_A; 
else next_dstate = D JDLE; 
end 

D_W AIT_FOR_N OT_FETCH_B : 
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begin 

fetch_abort = l’bl; 
if (fetch = l'bl) 

next_dstate = D_WAIT_FOR_NOT_FETCH_B; 
else next_dstate = D_IDLE; 
end 

START_SEND: 

begin 

cancel = l’bl; 
download = l’bl; 
latchO= l'bl; 
latchl = l'bl; 
latch2 = l’bl; 
latch3 = l’bl; 
mux_sel = l'bl; 

if (burs t_s tart == 2'dO) next_dstate = SENDOO; 
else if (burst_start == 2'dl) next_dstate = SEND1 1; 
else if ( burs t_s tart == 2'd2) next_dstate = SEND22; 
else if (burst_start == 2’d3) next_dstate = SEND33; 
else next_dstate = START_SEND; 
end 

SENDOO: 

begin 

ta_en_ = 1'bO; 
d_enO_= 1'bO; 
next_dstate = SENDOl; 
end 

SENDOl: 

begin 

ta_en_ = 1’bO; 
d_enl_= 1’bO; 
next_dstate = SEND02; 
end 

SEND02: 

begin 

ta_en_ = 1'bO; 
d_en2_= 1’bO; 
next_dstate = SEND03; 
end 

SEND03: 

begin 

ta_en_ = 1’bO; 
d_en3_= 1’bO; 
send_done = l'bl; 

next„dstate = SEN DETERMINATE; 
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end 



SEND 11: 
begin 

ta_en_ = 1'bO; 
d_enl_ = 1'bO; 
next_dstate = SEND 12; 
end 

SEND 12: 
begin 

ta_en_ = 1'bO; 
d_en2_= 1'bO; 
next_dstate = SEND 13; 
end 

SEND 13: 
begin 

ta_en_= 1'bO; 
d_en3_ = 1'bO; 
next_dstate = SEND 10; 
end 

SEND 10: 
begin 

ta_en_= 1’bO; 
d_enO_ = 1’bO; 
send_done= l'bl; 

next_dstate = SEND_TERMLNATE; 
end 

SEND22: 

begin 

ta_en_ = 1'bO; 
d_en2_ = 1'bO; 
next_dstate = SEND23; 
end 

SEND23: 

begin 

ta_en_= 1’bO; 
d_en3_ = 1’bO; 
next_dstate = SEND20; 
end 

SEND20: 

begin 

ta_en_ = 1'bO; 
d_enO_ = 1'bO; 
next_dstate = SEND21; 
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end 



SEND21: 

begin 

ta_en_= 1'bO; 
d_enl_= 1’bO; 
send_done = l'bl; 

next_dstate = SEND_TERMINATE; 
end 

SEND33: 

begin 

ta_en_ = 1’bO; 
d_en3_ = 1'bO; 
next_dstate = SEND30; 
end 

SEND30: 

begin 

ta_en_ = 1'bO; 
d_enO_ = 1'bO; 
next_dstate = SEND31; 
end 

SEND31: 

begin 

ta_en_ = 1’bO; 
d_enl_ = 1'bO; 
next_dstate = SEND32; 
end 

SEND32: 

begin 

ta_en_ = 1'bO; 
d_en2_ = 1'bO; 
send_done = l'bl; 

next_dstate = S END_TE RMIN A TE ; 
end 

SEND.TERMINATE: 

begin 

next_dstate = D_IDLE; 
end 

default: 

begin 

next_dstate = dc_dstate; 
end 

endcase 
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end 



endmodule 



1. Odd Parity Checker/Generator With 256 Inputs 



^************************************* ***************************************** 

* ODD PARITY CHECKER AND GENERATOR 

* Filename: parityo_chkgen256.v 

* Author: Joseph R. Robert, Jr. 

* Date: 29FEB96 

* Revised: 29FEB96 

* 

Purpose: This module checks the parity of the input data, comparing it to the input parity. Parity is odd including 
the parity bit. This module also generates the parity for the input data in groups of eight input bits. 






module parityo_chkgen256 (D,PIN,ERROR,PGEN); 

// epoch set_attribute FTXEDBLOCK = 1 

input [255:0] D; 
input [31:0] PIN; 
output [31:0] PGEN; 
output ERROR; 

wire ERROR_OJERROR_1,ERROR_2,ERROR_3,ERROR; 
parityo_chk64 parity_group_0 

(.D(D[ 63: 0]),.PIN(PIN[ 7: 0]),.ERROR(ERROR_0),.PGEN(PGEN[ 7: 0])); 
parityo_chk64 parity _group_l 

(.D(D[127: 64]),.PIN(PIN[15: 8]),.ERROR(ERROR_l),.PGEN(PGEN[15: 8])); 
parityo_chk64 panty_group_2 

(.D(D[191:128]),.PIN(PIN[23:16]),.ERROR(ERROR_2),.PGEN(PGEN[23:16])); 
parityo_chk64 parity _group_3 

(.D(D[255:192]),.PIN(PIN[31:24]),.ERROR(ERROR_3),.PGEN(PGEN[31:24])); 
stdor4 OR_A (ERROR_OJERROR_1 JERROR_2JERROR_3,ERROR); 



endmodule 



yy**** 






module parityo_chk64 (D,PIN,ERROR,PGEN); 
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input [63:0] D; 
input [7:0] PIN; 
output [7:0] PGEN; 
output ERROR; 

wire ERROR_0,ERROR_1,ERROR_2JERROR_3,ERROR_4,ERROR_5,ERROR_6,ERROR_7JERROR_A, 
ERROR_B,ERROR; 

paritycgo #(8,0,”AUTO”,"r) 

parity_group_0 (.D(D[ 7: 0]),.PIN(PIN[0]),.ERROR(ERROR_0),.PGEN(PGEN[0])); 
paritycgo #(8,0,"AUTO'7' 1") 

parity_group_l (.D(D[I5: 8]),.PIN(PIN[l]),.ERROR(ERROR_l),.PGEN(PGEN[l])); 
paritycgo #(8,0,' , AUTO , 7T') 

parity_group_2 (.D(D[23:16]),.PIN(PIN[2]) ? .ERROR(ERROR_2),.PGEN(PGEN[2])); 
paritycgo ^SA’AUTO’VT") 

parity_group_3 (.D(D[31:24]),.PIN(PIN[3]),.ERROR(ERROR_3),.PGEN(PGEN[3])); 
paritycgo #(8A M AUTO”, M l M ) 

parity_group_4 (.D(D[39:32]),.PIN(PIN[4]),.ERROR(ERROR_4) ? .PGEN(PGEN[4])); 
paritycgo #(8,0/ , AUTO M , ,, 1 M ) 

parity_group_5 (.D(D[47:40]),.PIN(PIN[5]),.ERROR(ERROR_5),.PGEN(PGEN[5])); 
paritycgo #(8,02 , AUTO ,, , ,, 1 M ) 

parity_group_6 (.D(D[55:48]),.PIN(PIN[6]),.ERROR(ERROR_6),.PGEN(PGEN[6])); 
paritycgo #(8,0, , 'AUTO M , , T') 

parity_group_7 (.D(D[63:56]),.PIN(PIN[7]),.ERROR(ERROR_7),.PGEN(PGEN[7])); 

stdor4 OR_A (ERROR_0,ERROR_1,ERROR_2,ERROR_3,ERROR_A); 
stdor4 OR_B (ERROR_4,ERROR_5,ERROR_6,ERROR_7,ERROR_B); 
stdor2 OR_C (ERROR. A,ERROR_B, ERROR); 

endmodule 



2 . Odd Parity Generator With 32 Inputs 



* ODD PARITY GENERATOR 

* Filename: parityo_gen32.v 

* Author: Joseph R. Robert, Jr. 

* Date: 12FEB96 

* Revised: 29FEB96 

* 

Purpose: This module generates odd parity bits for group of eight inputs. 

module parityo_gen32 (D.PGEN); 

input [31:0] D; 
output [3:0] PGEN; 
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wire [3:0] PGEN; 



parityo #(8,0, M AUTO n ; , l H ) parity _group_0 (.D(D[ 7: 0]),. PGEN (PGEN [0])); 
parityo #(8,0,"AUTO ,, , , T‘) parity_group_l (.D(D[15: 8]),.PGEN(PGEN[1])); 
parityo ^O/AHJTO’VT”) parity_group_2 (.D(D[23:16]),.PGEN(PGEN[2])); 
parityo #(8,0,^UTO’7T') parity_group_3 (.D(D[31:24]),.PGEN(PGEN[3])); 

endmodule 



H. TEST RESULTS 



Host command: verilog 
Command arguments: 

-f verilo^arguments 

-v/tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v 

prc.v 

prc_top.v 

sequencer4.v 

tarbiter.v 

tcpu.v 

testbench.v 

tmemory.v 

VERILOG-XL 2.1.2 log file created Mar 19, 1996 11:53:03 
VERILOG-XL 2.1.2 Mar 19, 1996 11:53:03 

Copyright (c) 1994 Cadence Design Systems, Inc. All Rights Reserved. 

Unpublished — rights reserved under the copyright laws of the United States. 

Copyright (c) 1994 UNIX Systems Laboratories, Inc. Reproduced with Permission. 

THIS SOFTWARE AND ON-LINE DOCUMENTATION CONTAIN CONFIDENTIAL INFORMATION 
AND TRADE SECRETS OF CADENCE DESIGN SYSTEMS, INC. USE, DISCLOSURE, OR 
REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF 
CADENCE DESIGN SYSTEMS, INC. 

RESTRICTED RIGHTS LEGEND 

Use, duplication, or disclosure by the Government is subject to 
restrictions as set forth in subparagraph (c)(l)(ii) of the Rights in 
Technical Data and Computer Software clause at DFARS 252.227-7013 or 
subparagraphs (c)(1) and (2) of Commercial Computer Software — Restricted 
Rights at 48 CFR 52.227-19, as applicable. 

Cadence Design Systems, Inc. 

555 River Oaks Parkway 

San Jose, California 95134 
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For technical assistance please contact the Cadence Response Center at 
1-800-CADENC2 or send email to crc_customers@cadence.com 

For more information on Cadence's Verilog-XL product line send email to 
talkverilog@cadence.com 

Compiling source file "prc.v" 

Compiling source file "prc_top.v" 

Compiling source file "sequencer4.v" 

Compiling source file "tarbiter.v" 

Compiling source file "tcpu.v” 

Compiling source file "testbench.v" 

Compiling source file "tmemory.v" 

Scanning library file , 7tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/prhnlib.v n 
Scanning library file '7tmp_mnt/h/joshua_u2/jrrobert/thesis/epoch/primlib.v n 



Warning! Implicit wire has no fanin 
"prc.v", 23159: NCO 


[Verilog-IWFA] 


Warning! Implicit wire has no fanin 
"prc.v”, 23159: NCI 


[Verilog-IWFA] 


Warning! Implicit wire has no fanin 
"prc.v", 23 159: NCO 


[Verilog-IWFA] 


Warning! Implicit wire has no fanin 
"prc.v", 23159: NCI • 


[Verilog-IWFA] 



Highest level modules: 
testbench 

*** SDF Annotator version 1.6_beta.3 

*** SDF file: /tmp_mnt/h/joshua_u2/jrroben/thesis/verilog/hardware/prc.sdf 

*** Back-annotation scope: testbench.PRCl.PRCl 

*** No configuration file specified - using default options 

*** SDF Annotator log file: sdf.log 

*** No MTM selection parameter specified 

*** No SCALE FACTORS parameter specified 

*** No SCALE TYPE parameter specified 

Configuring for back-annotation... 

Reading SDF file and back -annotating timing data... 



*** SDF back-annotation successfully completed 
PRC granted the data bus. 

(ERROR): WR and A are both unknown at time 6.700 
(ERROR): WR and A are both unknown at time 6.700 
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(ERROR): WR and A are both unknown at time 6.700 
(ERROR) WR transition to unknown and (din != MEM[a]) at time 7.000 
Instance: testbench.PRC 1 .PRC 1 .LM 1 .MRM A Jist.hsram.inst 1 
(ERROR) WR transition to unknown and (din != MEM[a]) at time 7.000 
Instance: testbench.PRCl.PRCl.DLl.data_raml.hsram.instl 
(ERROR) WR transition to unknown and (din != MEM[a]) at time 7.000 
Instance: testbenchPRC 1 .PRC 1 .DL 1 .data_ram0.hsram .inst 1 



System hard reset at time 35. 

CPU started read from address 00000000 at time 
CPU read: 0001020304050607 at 21 1 

CPU read: 08090a0b0c0d0e0f at 27 1 

CPU read: 1011121314151617 at 331 

PRC requested the bus. 

CPU read: 18191alblcldlelf at 391 

CPU started read from address 00000020 at time 
CPU read: 2021222324252627 at 556 

CPU read: 28292a2b2c2d2e2f at 616 

CPU read: 3031323334353637 at 676 

CPU read: 38393a3b3c3d3e3f at 736 

PRC granted the data bus. 

CPU started read from address 00000180 at time 
CPU read: 0001020304050607 at 1381 

CPU read: 08090a0b0c0d0e0f at 1441 

CPU read: 1011121314151617 at 1501 

CPU read: 1 81 91 albl cldlelf at 1561 

CPU started read from address 00000 laO at time 
CPU read: 2021222324252627 at 1831 

PRC requested the bus. 

CPU read: 28292a2b2c2d2e2f at 1891 

CPU read: 3031323334353637 at 1951 

CPU read: 38393a3b3c3d3e3f at 2011 

PRC granted the data bus. 

CPU started read from address 00000040 at time 
CPU read: 4041424344454647 at 2641 

CPU read: 404 1424344454647 at 2656 

CPU read: 5051525354555657 at 2671 

CPU read: 4041424344454647 at 2686 

PRC requested the bus. 

PRC granted the data bus. 



CPU started write to address OOOOOlcO at time 
CPU write beat 1: 7777777777777777 at 
CPU write beat 2: 8888888888888888 at 
CPU write beat 3: 1111111111111111 at 
CPU write beat 4: 3333333333333333 at 
CPU started read from address 00000060 at time 



CPU read: 6061626364656667 at 3916 

CPU read: 606 1 626364656667 at 393 1 

CPU read: 7071727374757677 at 3946 

CPU read: 6061626364656667 at 3961 

PRC requested the bus. 



45. 



420. 



1215. 



1665. 



2490. 



3307. 

3322 

3488 

3548 

3608 

3765. 
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PRC granted the data bus. 



CPU started read from address OOOOO IcO at time 



4440. 



CPU read: 7777777777777777 at 
CPU read: 8888888888888888 at 
CPU read: 1111111111111111 at 
CPU read: 3333333333333333 at 



4606 

4666 

4726 

4786 



L125 "testbench.v": Sfinish at simulation time 5035000 
4 warnings 

158647 simulation events + 266655 accelerated events + 926440 timing check events 
CPU time: 6.1 secs to compile + 161.8 secs to link + 377.5 secs in simulation 
End of VERILOG-XL 2.1.2 Mar 19, 1996 12:15:44 
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